{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Databases "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Introduction\n",
"------------\n",
"\n",
"Many of you will deal with complex data — and often, lots of it. Ecological and Evolutionary data are particularly complex because they contain large numbers of attributes, often measured in very different scales and units for individual taxa, populations, etc. In this scenario, storing the data in a database makes a lot of sense! You can\n",
"easily include the database in your analysis workflow — indeed, that's why people use databases. And you can use python (and R) to build, manipulate and use your database.\n",
"\n",
"### Relational databases\n",
"\n",
"\n",
"A *relational* database is a collection of interlinked (*related*) tables that altogether store a complex dataset\n",
"in a logical, computer-readable format. Dividing a dataset into multiple tables minimizes redundancies. For example, if your data were sampled from three sites — then, rather than repeating the site name and description in each row in a text file, you could just specify a numerical \"key\" that directs to another table containing the sampling site name and description.\n",
"\n",
"Finally, if you have many rows in your data file, the type of sequential access we have been using in our `python` and `R` scripts is inefficient — you should be able to instantly access any row regardless of its position\n",
"\n",
"Data columns in a database are usually called *fields*, while the rows are the *records*. Here are a few things to\n",
"keep in mind about databases:\n",
"\n",
"* Each field typically contains only one data type (e.g., integers, floats, strings)\n",
"* Each record is a \"data point\", composed of different values, one for each field — somewhat like a python tuple\n",
"* Some fields are special, and are called *keys*:\n",
" * The *primary key* uniquely defines a record in a table (e.g., each row is identified by a unique number)\n",
" * To allow fast retrieval, some fields (and typically all the keys) are indexed — a copy of certain columns that can be searched very efficiently.\n",
" * *Foreign keys* are keys in a table that are primary keys in another table and define relationships between the tables\n",
"\n",
"The key to designing a database is to minimize redundancy and dependency without losing the logical consistency of tables — this is called *normalization* (arguably more of an art than a science!)\n",
"\n",
"Let's look at a simple example.\n",
"\n",
"Imagine you recorded body sizes of species from different field sites in a single text file (e.g., a `.csv` file) with the following fields:\n",
"\n",
"|Field|Definition|\n",
"|:-|:-|\n",
"|`ID` | Unique ID for the record|\n",
"|`SiteName` | Name of the site|\n",
"|`SiteLong` | Longitude of the site|\n",
"|`SiteLat` | Latitude of the site|\n",
"|`SamplingDate` | Date of the sample|\n",
"|`SamplingHour` | Hour of the sampling|\n",
"|`SamplingAvgTemp` | Average air temperature on the sampling day|\n",
"|`SamplingWaterTemp` | Temperature of the water|\n",
"|`SamplingPH` | PH of the water|\n",
"|`SpeciesCommonName`| Species of the sampled individual|\n",
"|`SpeciesLatinBinom`| Latin binomial of the species|\n",
"|`BodySize` | Width of the individual|\n",
"|`BodyWeight` | Weight of the individual|\n",
"\n",
"\n",
"It would be logical to divide the data into four tables:\n",
"\n",
"*Site table*:\n",
"\n",
"|Field|Definition|\n",
"|:-|:-|\n",
"|`SiteID` |ID for the site|\n",
"|`SiteName`| Name of the site|\n",
"|`SiteLong` | Longitude of the site|\n",
"|`SiteLat` | Latitude of the site|\n",
"\n",
"\n",
"*Sample table*:\n",
"\n",
"|Field|Definition|\n",
"|:-|:-|\n",
"|`SamplingID` | ID for the sampling date|\n",
"|`SamplingDate` | Date of the sample|\n",
"|`SamplingHour` | Hour of the sample|\n",
"|`SamplingAvgTemp` |Average air temperature|\n",
"|`SamplingWaterTemp`| Temperature of the water|\n",
"|`SamplingPH` | PH of the water|\n",
"\n",
"\n",
"*Species table*:\n",
"\n",
"|Field|Definition|\n",
"|:-|:-|\n",
"|`SpeciesID` | ID for the species|\n",
"|`SpeciesCommonName`| Species name|\n",
"|`SpeciesLatinBinom` | Latin binomial of the species|\n",
"\n",
"\n",
"*Individual table*:\n",
"\n",
"|Field|Definition|\n",
"|:-|:-|\n",
"|`IndividualID`| ID for the individual sampled|\n",
"|`SpeciesID` | ID for the species|\n",
"|`SamplingID` |ID for the sampling day|\n",
"|`SiteID` | ID for the site|\n",
"|`BodySize` | Width of the individual|\n",
"|`BodyWeight` | Weight of the individual|\n",
"\n",
"In each table, the first ID field is the primary key. The last table contains three foreign keys because each individual is associated with one species, one sampling day and one sampling site.\n",
"\n",
"These structural features of a database are called its *schema*.\n",
"\n",
"## SQLite\n",
"\n",
"`SQLite` is a simple (and very popular) SQL (Structured Query Language)-based solution for managing localized, personal databases. I can safely bet that most, if not all of you unknowingly (or knowingly!) use `SQLite` — it is used by MacOSX, Firefox, Acrobat Reader,iTunes, Skype, iPhone, etc. SQLite is also the database \"engine\" underlying your [Silwood Masters Web App](http://silwoodmasters.co.uk)\n",
"\n",
"We can easily use SQLite through Python scripts. First, install SQLite by typing in the Ubuntu terminal:\n",
"\n",
"```bash\n",
"sudo apt install sqlite3 libsqlite3-dev\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Also, make sure that you have the necessary package for python by typing `import sqlite3` in the python or ipython shell. Finally, you may install a GUI for SQLite3 :\n",
"\n",
"`sudo apt install sqliteman`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now type `sqlite3` in the Ubuntu terminal to check if SQLite successfully launches.\n",
"\n",
"SQLite has very few data types (and lacks a boolean and a date type):\n",
"\n",
"|Field Data Type| Definition|\n",
"|:-|:-|\n",
"|`NULL` | The value is a NULL value |\n",
"|`INTEGER` | The value is a signed integer, stored in up to or 8 bytes |\n",
"| `REAL` | The value is a floating point value, stored as in 8 bytes |\n",
"| `TEXT` | The value is a text string |\n",
"| `BLOB` | The value is a blob of data, stored exactly as it was input (useful for binary types, such as bitmap images or pdfs) |\n",
"\n",
"Typically, you will build a database by importing csv data — be aware that:\n",
"\n",
"* Headers: the csv should have no headers\n",
"* Separators: if the comma is the separator, each record should not contain any other commas\n",
"* Quotes: there should be no quotes in the data\n",
"* Newlines: there should be no newlines\n",
"\n",
"Now build your first database in SQLite! We will use as example a global dataset on metabolic traits called *Biotraits* that we are currently developing in our lab (should be in your `Data` directory). This dataset contains 164 columns (fields). Thermal response curves for different traits and species are stored in rows. This means\n",
"that site description or taxonomy are repeated as many times as temperatures are measured in the curve. You can imagine how much redundacy can be here!!!\n",
"\n",
"For this reason, it is easier to migrate the dataset to SQL and split it into several tables:\n",
"\n",
"* *TCP*: Includes the thermal curve performance for each species and trait (as many rows per trait and species as temperatures have been measured within the TCP)\n",
"\n",
"* *TraitInfo*: Contains site description and conditions under the traits were measured (one row per thermal curve)\n",
"\n",
"* Consumer: Consumer description including taxonomy (one row per thermal\n",
"curve).\n",
"\n",
"* Resource: Resource description including taxonomy (one row per thermal\n",
"curve).\n",
"\n",
"* Size: Size data for each species (one row per thermal curve)\n",
"\n",
"DataSource: Contains information about the data source (citation, contributors) (one row per thermal curve).\n",
"\n",
"So all these tables compose the *Biotraits* `schema`.\n",
"\n",
"In an Linux/Unix terminal, navigate to your `data` directory:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, launch a new database using sqlite:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```bash\n",
"sqlite3 Biotraits.db\n",
"```\n",
"\n",
"This should return something like:\n",
"\n",
"```sql\n",
"SQLite version 3.11.0 2016-02-15 17:29:24\n",
"Enter \".help\" for usage hints.\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This creates an empty database in your `data` directory. \n",
"\n",
"You should now see the sqlite cursor (`sqlite>`), and will be entering your commands there. \n",
"\n",
"Now we need to create a table with some fields. Let's start with the *TraitInfo* table (enter these one line at a time, without the `...>`):"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```bash\n",
"sqlite> CREATE TABLE TraitInfo (Numbers integer primary key,\n",
" ...> OriginalID text,\n",
" ...> FinalID text,\n",
" ...> OriginalTraitName text,\n",
" ...> OriginalTraitDef text,\n",
" ...> Replicates integer,\n",
" ...> Habitat integer, \n",
" ...> Climate text,\n",
" ...> Location text,\n",
" ...> LocationType text,\n",
" ...> LocationDate text,\n",
" ...> CoordinateType text,\n",
" ...> Latitude integer,\n",
" ...> Longitude integer);\n",
"``` "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that I am writing all SQL commands in upper case, but it is not necessary. I am using upper case here because SQL syntax is long and clunky, and it quickly becomes hard to spot (and edit) commands in long strings of complex queries.\n",
"\n",
"Now let's import the dataset:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`sqlite> .mode csv`\n",
"\n",
"`sqlite> .import TraitInfo.csv TraitInfo`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So we built a table and imported a csv file into it. Now we can ask\n",
"SQLite to show all the tables we currently have:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`sqlite> .tables`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's run our first *Query* (note that you need a semicolon\n",
"to end a command):"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`sqlite> SELECT * FROM TraitInfo LIMIT 5;`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's turn on some nicer formatting:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`sqlite> .mode column`\n",
"\n",
"`sqlite> .header ON`\n",
"\n",
"`sqlite> SELECT * FROM TraitInfo LIMIT 5;`\n",
"\n",
"You should see something like: \n",
"\n",
"```bash\n",
"Numbers OriginalID FinalID OriginalTraitName ... \n",
"------- ---------- ---------- ------------------------- ...\n",
"1 1 MTD1 Resource Consumption Rate ...\n",
"4 2 MTD2 Resource Consumption Rate ...\n",
"6 3 MTD3 Resource Consumption Rate ...\n",
"9 4 MTD4 Resource Mass Consumption ...\n",
"12 5 MTD5 Resource Mass Consumption ...\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The main statement to select records from a table is\n",
"`SELECT`:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`sqlite> .width 40 ## NOTE: Control the width`\n",
"\n",
"`sqlite> SELECT DISTINCT OriginalTraitName FROM TraitInfo; # Returns unique values`\n",
"\n",
"Which gives:\n",
"\n",
"```bash\n",
"OriginalTraitName \n",
"----------------------------------------\n",
"Resource Consumption Rate \n",
"Resource Mass Consumption Rate \n",
"Mass-Specific Mass Consumption Rate \n",
"Voluntary Body Velocity \n",
"Forward Attack Distance \n",
"Foraging Velocity \n",
"Resource Reaction Distance \n",
"....\n",
"```\n",
"\n",
"Now try these: \n",
"\n",
"```bash\n",
"sqlite> SELECT DISTINCT Habitat FROM TraitInfo\n",
" ...> WHERE OriginalTraitName = \"Resource Consumption Rate\"; # Sets a condition`\n",
"\n",
"Habitat \n",
"----------------------------------------\n",
"freshwater \n",
"marine \n",
"terrestrial \n",
"\n",
"sqlite> SELECT COUNT (*) FROM TraitInfo; # Returns number of rows\n",
"\n",
"Count (*) \n",
"--------------------\n",
"2336\n",
"\n",
"sqlite> SELECT Habitat, COUNT(OriginalTraitName) # Returns number of rows for each group\n",
" ...> FROM TraitInfo GROUP BY Habitat;\n",
"\n",
"Habitat COUNT(OriginalTraitName)\n",
"---------- ------------------------\n",
"NA 16 \n",
"freshwater 609 \n",
"marine 909 \n",
"terrestria 802 \n",
"\n",
"sqlite> SELECT COUNT(DISTINCT OriginalTraitName) # Returns number of unique values\n",
" ...> FROM TraitInfo;\n",
"\n",
"COUNT(DISTINCT OriginalTraitName)\n",
"---------------------------------\n",
"220 \n",
"\n",
"sqlite> SELECT COUNT(DISTINCT OriginalTraitName) TraitCount # Assigns alias to the variable\n",
" ...> FROM TraitInfo;\n",
"\n",
"TraitCount\n",
"----------\n",
"\n",
"sqlite> SELECT Habitat,\n",
" ...> COUNT(DISTINCT OriginalTraitName) AS TN\n",
" ...> FROM TraitInfo GROUP BY Habitat;\n",
"\n",
"Habitat TN \n",
"---------- ----------\n",
"NA 7 \n",
"freshwater 82 \n",
"marine 95 \n",
"terrestria 96 \n",
"\n",
"sqlite> SELECT * # WHAT TO SELECT\n",
" ...> FROM TraitInfo # FROM WHERE\n",
" ...> WHERE Habitat = \"marine\" # CONDITIONS\n",
" ...> AND OriginalTraitName = \"Resource Consumption Rate\";\n",
"\n",
"Numbers OriginalID FinalID OriginalTraitName ...\n",
"---------- ---------- ---------- ------------------------- ...\n",
"778 308 MTD99 Resource Consumption Rate ...\n",
"798 310 MTD101 Resource Consumption Rate ...\n",
"806 311 MTD102 Resource Consumption Rate ...\n",
"993 351 MTD113 Resource Consumption Rate ...\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The structure of the `SELECT` command is as follows (*Note: **all** characters are case **in**sensitive*):"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```bash\n",
"SELECT [DISTINCT] field\n",
"FROM table\n",
"WHERE predicate\n",
"GROUP BY field\n",
"HAVING predicate\n",
"ORDER BY field\n",
"LIMIT number\n",
";\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's try some more elaborate queries:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```bash\n",
"sqlite> SELECT Numbers FROM TraitInfo LIMIT 5;\n",
"\n",
"Numbers \n",
"----------\n",
"1 \n",
"4 \n",
"6 \n",
"9 \n",
"12 \n",
"\n",
"sqlite> SELECT Numbers \n",
" ...> FROM TraitInfo\n",
" ...> WHERE Numbers > 100 \n",
" ...> AND Numbers < 200;\n",
"\n",
"Numbers \n",
"----------\n",
"107 \n",
"110 \n",
"112 \n",
"115 \n",
"\n",
"sqlite> SELECT Numbers \n",
" ...> FROM TraitInfo\n",
" ...> WHERE Habitat = \"freshwater\"\n",
" ...> AND Number > 700\n",
" ...> AND Number < 800;\n",
"\n",
"Numbers \n",
"----------\n",
"704 \n",
"708 \n",
"712 \n",
"716 \n",
"720 \n",
"725 \n",
"730 \n",
"735 \n",
"740 \n",
"744 \n",
"748 \n",
"``` "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also match records using something like regular expressions. In SQL, when we use the command `LIKE`, the percent % symbol matches any sequence of zero or more characters and the underscore matches any single character. Similarly, `GLOB` uses the asterisk and the underscore."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```bash\n",
"sqlite> SELECT DISTINCT OriginalTraitName\n",
" ...> FROM TraitInfo\n",
" ...> WHERE OriginalTraitName LIKE \"_esource Consumption Rate\";\n",
"\n",
"OriginalTraitName \n",
"-------------------------\n",
"Resource Consumption Rate \n",
"\n",
"sqlite> SELECT DISTINCT OriginalTraitName\n",
" ...> FROM TraitInfo\n",
" ...> WHERE OriginalTraitName LIKE \"Resource%\";\n",
"\n",
"OriginalTraitName \n",
"----------------------------------------\n",
"Resource Consumption Rate \n",
"Resource Mass Consumption Rate \n",
"Resource Reaction Distance \n",
"Resource Habitat Encounter Rate \n",
"Resource Consumption Probability \n",
"Resource Mobility Selection \n",
"Resource Size Selection \n",
"Resource Size Capture Intent Acceptance \n",
"Resource Encounter Rate \n",
"Resource Escape Response Probability \n",
"\n",
"sqlite> SELECT DISTINCT OriginalTraitName\n",
" ...> FROM TraitInfo\n",
" ...> WHERE OriginalTraitName GLOB \"Resource*\";\n",
"\n",
"\n",
"OriginalTraitName \n",
"----------------------------------------\n",
"Resource Consumption Rate \n",
"Resource Mass Consumption Rate \n",
"Resource Reaction Distance \n",
"Resource Habitat Encounter Rate \n",
"Resource Consumption Probability \n",
"Resource Mobility Selection \n",
"Resource Size Selection \n",
"Resource Size Capture Intent Acceptance \n",
"Resource Encounter Rate \n",
"Resource Escape Response Probability \n",
"\n",
"# NOTE THAT GLOB IS CASE SENSITIVE, WHILE LIKE IS NOT\n",
"\n",
"sqlite> SELECT DISTINCT OriginalTraitName\n",
" ...> FROM TraitInfo\n",
" ...> WHERE OriginalTraitName LIKE \"resource%\";\n",
"\n",
"OriginalTraitName \n",
"----------------------------------------\n",
"Resource Consumption Rate \n",
"Resource Mass Consumption Rate \n",
"Resource Reaction Distance \n",
"Resource Habitat Encounter Rate \n",
"Resource Consumption Probability \n",
"Resource Mobility Selection \n",
"Resource Size Selection \n",
"Resource Size Capture Intent Acceptance \n",
"Resource Encounter Rate \n",
"Resource Escape Response Probability \n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also order by any column:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```bash\n",
"sqlite> SELECT OriginalTraitName, Habitat FROM \n",
" ...> TraitInfo LIMIT 5;\n",
"\n",
"OriginalTraitName Habitat \n",
"------------------------- ----------\n",
"Resource Consumption Rate freshwater\n",
"Resource Consumption Rate freshwater\n",
"Resource Consumption Rate freshwater\n",
"Resource Mass Consumption freshwater\n",
"Resource Mass Consumption freshwater\n",
"\n",
"sqlite> SELECT OriginalTraitName, Habitat FROM \n",
" ...> TraitInfo ORDER BY OriginalTraitName LIMIT 5;\n",
"\n",
"OriginalTraitName Habitat \n",
"-------------------------- ----------\n",
"48-hr Hatching Probability marine \n",
"Asexual Reproduction Rate marine \n",
"Attack Body Acceleration marine \n",
"Attack Body Velocity marine \n",
"Attack Body Velocity marine \n",
" ```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Until now we have just queried data from one single table, but as we\n",
"have seen, the point of storing a database in SQL is that we can use\n",
"multiple tables minimizing redundancies within them. And of course,\n",
"querying data from those different tables at the same time will be\n",
"necessary at some point.\n",
"\n",
"Let's import then one more table to our database:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```bash\n",
"sqlite> CREATE TABLE Consumer (Numbers integer primary key,\n",
" ...> OriginalID text,\n",
" ...> FinalID text,\n",
" ...> Consumer text,\n",
" ...> ConCommon text,\n",
" ...> ConKingdom text,\n",
" ...> ConPhylum text,\n",
" ...> ConClass text,\n",
" ...> ConOrder text,\n",
" ...> ConFamily text,\n",
" ...> ConGenus text,\n",
" ...> ConSpecies text);\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```bash\n",
"sqlite> .import Consumer.csv Consumer\n",
"```\n",
"Now we have two tables in our database:\n",
"\n",
"```bash\n",
"sqlite> .tables\n",
"Consumer TraitInfo\n",
"```\n",
"\n",
"These tables are connected by two different keys: `OriginalID` and `FinalID`. These are unique IDs for each thermal curve. For each `FinalID` we can get the trait name (`OriginalTraitName`) from the `TraitInfo` table and the corresponding species name (`ConSpecies`) from the `Consumer` table."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```bash\n",
"sqlite> SELECT A1.FinalID, A1.Consumer, A2.FinalID, A2.OriginalTraitName\n",
" ...> FROM Consumer A1, TraitInfo A2\n",
" ...> WHERE A1.FinalID=A2.FinalID LIMIT 8;\n",
"\n",
"FinalID Consumer FinalID OriginalTraitName \n",
"---------- --------------------- ---------- -------------------------\n",
"MTD1 Chaoborus trivittatus MTD1 Resource Consumption Rate\n",
"MTD2 Chaoborus trivittatus MTD2 Resource Consumption Rate\n",
"MTD3 Chaoborus americanus MTD3 Resource Consumption Rate\n",
"MTD4 Stizostedion vitreum MTD4 Resource Mass Consumption\n",
"MTD5 Macrobrachium rosenbe MTD5 Resource Mass Consumption\n",
"MTD6 Ranatra dispar MTD6 Resource Consumption Rate\n",
"MTD7 Ceriodaphnia reticula MTD7 Mass-Specific Mass Consum\n",
"MTD8 Polyphemus pediculus MTD8 Voluntary Body Velocity \n",
"\n",
"# In the same way we assign alias to variables, we can use them for tables.\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This example seems easy because both tables have the same number of rows. But the query is still as simple when we have tables with different rows.\n",
"\n",
"Let's import the TCP table:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```bash\n",
"sqlite> CREATE TABLE TCP (Numbers integer primary key,\n",
" ...> OriginalID text,\n",
" ...> FinalID text,\n",
" ...> OriginalTraitValue integer,\n",
" ...> OriginalTraitUnit text,\n",
" ...> LabGrowthTemp integer,\n",
" ...> LabGrowthTempUnit text, \n",
" ...> ConTemp integer,\n",
" ...> ConTempUnit text,\n",
" ...> ConTempMethod text,\n",
" ...> ConAcc text,\n",
" ...> ConAccTemp integer);\n",
"\n",
"sqlite> .import TCP.csv TCP\n",
"sqlite> .tables\n",
"Consumer TCP TraitInfo\n",
"```\n",
"\n",
"Now imagine we want to query the thermal performance curves that we have stored for the species Mytilus edulis. Using the FinalID to match the tables, the query can be as simple as:\n",
"\n",
"```bash\n",
"sqlite> CREATE TABLE TCP (Numbers integer primary key,\n",
" ...> OriginalID text,\n",
" ...> FinalID text,\n",
" ...> OriginalTraitValue integer,\n",
" ...> OriginalTraitUnit text,\n",
" ...> LabGrowthTemp integer,\n",
" ...> LabGrowthTempUnit text, \n",
" ...> ConTemp integer,\n",
" ...> ConTempUnit text,\n",
" ...> ConTempMethod text,\n",
" ...> ConAcc text,\n",
" ...> ConAccTemp integer);\n",
"\n",
"sqlite> .import TCP.csv TCP\n",
"sqlite> .tables\n",
"Consumer TCP TraitInfo\n",
"\n",
" \n",
"\n",
"sqlite> SELECT A1.ConTemp, A1.OriginalTraitValue, A2.OriginalTraitName, A3.Consumer\n",
" ...> FROM TCP A1, TraitInfo A2, Consumer A3\n",
" ...> WHERE A1.FinalID=A2.FinalID AND A3.ConSpecies=\"Mytilus edulis\" AND A3.FinalID=A2.FinalID LIMIT 8\n",
"\n",
"ConTemp OriginalTraitValue OriginalTraitName Consumer \n",
"---------- -------------------- ------------------------------ --------------------\n",
"25 2.707075 Filtration Rate Mytilus edulis \n",
"20 3.40721 Filtration Rate Mytilus edulis \n",
"5 3.419455 Filtration Rate Mytilus edulis \n",
"15 3.711165 Filtration Rate Mytilus edulis \n",
"10 3.875465 Filtration Rate Mytilus edulis \n",
"5 0.34 In Vitro Gill Particle Transpo Mytilus edulis \n",
"10 0.46 In Vitro Gill Particle Transpo Mytilus edulis \n",
"15 0.595 In Vitro Gill Particle Transpo Mytilus edulis\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So on and so forth (joining tables etc. would come next...). But if you want to keep practicing and learn more about sqlite commands, this is a very useful site: . You can\n",
"store your queries and database management commands in an ` .sql` file (`geany` will take care of syntax highlighting etc.)\n",
"\n",
"## SQLite with Python\n",
"\n",
"It is easy to access, update and manage SQLite databases with Python (you will find this script file in the `code` directory):\n",
"\n",
"```python\n",
"import sqlite3\n",
"\n",
"conn = sqlite3.connect(\":memory:\")\n",
"\n",
"c = conn.cursor()\n",
"\n",
"c.execute(\"CREATE TABLE tt (Val TEXT)\")\n",
"\n",
"conn.commit()\n",
"\n",
"z = [('a',), ('ab',), ('abc',), ('b',), ('c',)]\n",
"\n",
"c.executemany(\"INSERT INTO tt VALUES (?)\", z)\n",
"\n",
"conn.commit()\n",
"\n",
"c.execute(\"SELECT * FROM tt WHERE Val LIKE 'a%'\").fetchall()\n",
"\n",
"conn.close()\n",
"```\n",
"\n",
"You can create a database in memory, without using the disk — thus you can create and discard an SQLite database within your workflow!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Readings and Resources\n",
"----------------------\n",
"\n",
"* \"The Definitive Guide to SQLite\" is a pretty complete guide to SQLite and freely available from [here]( \n",
" http://sd.blackball.lv/library/The_Definitive_Guide_to_SQLite_2nd_edition.pdf)\n",
"\n",
"* For databses in general, try the [Stanford Introduction to Databases course](https://www.coursera.org/course/db)\n",
"\n",
"* A set of sqlite tutorials in Jupyter: https://github.com/royalosyin/Practice-SQL-with-SQLite-and-Jupyter-Notebook "
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
},
"latex_envs": {
"LaTeX_envs_menu_present": true,
"autoclose": false,
"autocomplete": false,
"bibliofile": "biblio.bib",
"cite_by": "apalike",
"current_citInitial": 1,
"eqLabelWithNumbers": true,
"eqNumInitial": 1,
"hotkeys": {
"equation": "Ctrl-E",
"itemize": "Ctrl-I"
},
"labels_anchors": false,
"latex_user_defs": false,
"report_style_numbering": false,
"user_envs_cfg": false
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": false,
"skip_h1_title": false,
"title_cell": "Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": false,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}