README.md 7.7 KB
Newer Older
dmattek's avatar
dmattek committed
1 2
# Time-course analysis web-app

Jacques Marc-Antoine's avatar
Jacques Marc-Antoine committed
3 4 5 6 7
- [Time-course analysis web-app](#time-course-analysis-web-app)
  * [Running the app](#running-the-app)
    + [Runnning instance](#runnning-instance)
    + [Running the app on the server](#running-the-app-on-the-server)
    + [Running the app locally](#running-the-app-locally)
Maciej Dobrzynski's avatar
Maciej Dobrzynski committed
8
    + [Running the app directly from GitHub](#running-the-app-directly-from-github)
Jacques Marc-Antoine's avatar
Jacques Marc-Antoine committed
9 10 11 12
  * [Input file](#input-file)
      - [Long format](#long-format)
      - [Wide format](#wide-format)
  * [Unique track IDs](#unique-track-ids)
Jacques Marc-Antoine's avatar
Jacques Marc-Antoine committed
13
  * [Modules and Functionality](#modules-and-functionality)
Jacques Marc-Antoine's avatar
Jacques Marc-Antoine committed
14 15 16 17


## Running the app
### Running instance
Maciej Dobrzynski's avatar
Maciej Dobrzynski committed
18 19
Access the running instance of the app on [shinyapps.io](https://macdobry.shinyapps.io/tcourse-inspector/ "TimeCourse Inspector")

Jacques Marc-Antoine's avatar
Jacques Marc-Antoine committed
20
### Running the app on the server
Maciej Dobrzynski's avatar
Maciej Dobrzynski committed
21
The app can be deployed on RStudio/Shiny server. Follow instruction [here](https://shiny.rstudio.com/deploy/ "Shiny - Hosting").
dmattek's avatar
dmattek committed
22

Jacques Marc-Antoine's avatar
Jacques Marc-Antoine committed
23
### Running the app locally
Maciej Dobrzynski's avatar
Maciej Dobrzynski committed
24
Alternatively, after downloading the code, the app can run within RStudio. Open `server.R` or `ui.R` file, then click "Run App" button with green triangle in the upper right corner of the window with code open.
dmattek's avatar
dmattek committed
25 26 27 28 29

Following packages need to be installed in order to run the app locally:

* shiny
* shinyjs
dmattek's avatar
dmattek committed
30
* shinyBS
dmattek's avatar
dmattek committed
31
* shinycssloaders
dmattek's avatar
dmattek committed
32
* data.table
Maciej Dobrzynski's avatar
Maciej Dobrzynski committed
33
* DT
dmattek's avatar
dmattek committed
34 35 36
* ggplot2
* gplots
* plotly
Maciej Dobrzynski's avatar
Maciej Dobrzynski committed
37 38
* scales
* grid
dmattek's avatar
dmattek committed
39 40
* dendextend
* RColorBrewer
dmattek's avatar
dmattek committed
41
* ggthemes
dmattek's avatar
dmattek committed
42
* sparcl
dmattek's avatar
dmattek committed
43
* dtw
dmattek's avatar
dmattek committed
44
* factoextra
Maciej Dobrzynski's avatar
Maciej Dobrzynski committed
45
* imputeTS
46 47 48
* MASS
* robust
* pracma
49
* Hmisc
dmattek's avatar
dmattek committed
50 51 52

Install packages using `install.packages('name_of_the_package_from_the_list_above')` command in RStudio command line.

dmattek's avatar
dmattek committed
53
```
dmattek's avatar
dmattek committed
54
install.packages(c("shiny", "shinyjs", "shinyBS", "shinycssloaders",
55
					"data.table", "DT",
Maciej Dobrzynski's avatar
Maciej Dobrzynski committed
56
					"ggplot2", "gplots", "plotly", "scales", "grid",
dmattek's avatar
dmattek committed
57
					"dendextend", "RColorBrewer", "ggthemes",
dmattek's avatar
dmattek committed
58
					"sparcl", "dtw", "factoextra",
Maciej Dobrzynski's avatar
Maciej Dobrzynski committed
59
					"imputeTS", "MASS", "robust", "pracma", "Hmisc")) 
60 61
```

Maciej Dobrzynski's avatar
Maciej Dobrzynski committed
62
### Running the app directly from GitHub
Jacques Marc-Antoine's avatar
Jacques Marc-Antoine committed
63 64 65 66 67 68
Running the 2 following lines should get you started immediatly with a temporary copy of the app:
```
library(shiny)
runGitHub("dmattek/shiny-timecourse-inspector")
```

dmattek's avatar
dmattek committed
69
## Input file
dmattek's avatar
dmattek committed
70

Jacques Marc-Antoine's avatar
Jacques Marc-Antoine committed
71
The app recognizes CSV (comma-separated values) files where data columns are separated by a comma and floating point numbers use a dot (full-stop). Compressed CSV files in zip or bz2 format can be uploaded directly without decompression. Both long and wide data formats are accepted but we highly recommend using the long format because it allows for multiple groupings and multivariate measurements.
dmattek's avatar
dmattek committed
72

Jacques Marc-Antoine's avatar
Jacques Marc-Antoine committed
73 74
#### Long format
In the long format, the first row should include column headers. The input CSV file should contain at least these three columns:
Maciej Dobrzynski's avatar
Maciej Dobrzynski committed
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90

* Identifier of a time series, i.e. a track label
* Time points
* Time-varying variable

| ID | Time | Meas1 |
|----|------|-------|
| 1  |  1   | 3.3   |
| 1  |  2   | 2.1   |
| 1  |  4   | 4.3   |
|----|------|-------|
| 2  |  1   | 2.8   |
| 2  |  2   | 1.9   |
| 2  |  3   | 1.7   |
| 2  |  4   | 2.2   |

Jacques Marc-Antoine's avatar
Jacques Marc-Antoine committed
91
In case of multivariate time series, additional columns with variables can be added in the input. Then, the GUI allows for choosing a single or a combination of two variables to display.
Maciej Dobrzynski's avatar
Maciej Dobrzynski committed
92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111

Time series can be grouped by introducing a grouping column:

| Group | ID | Time | Meas1 |
|-------|----|------|-------|
| gr1   | 1  |  1   | 3.3   |
| gr1   | 1  |  2   | 2.1   |
| gr1   | 1  |  4   | 4.3   |
|-------|----|------|-------|
| gr1   | 2  |  1   | 2.8   |
| gr1   | 2  |  2   | 1.9   |
| gr1   | 2  |  3   | 1.7   |
| gr1   | 2  |  4   | 2.2   |
|-------|----|------|-------|
| gr2   | 1  |  1   | 5.1   |
| gr2   | 1  |  2   | 5.4   |
| gr2   | 1  |  3   | 5.3   |

Introduction of grouping allows for the analysis and displaying data per group.

Jacques Marc-Antoine's avatar
Jacques Marc-Antoine committed
112 113 114 115 116 117 118 119 120 121 122 123 124
#### Wide format
In wide format, entire univariate time series are stored as rows, with columns treated as time points. The first two columns should contain a grouping and the identifier of time series.

| Group | ID | 0   | 1   | 2   | further time points |
|-------|----|-----|-----|-----|---------------------|
| gr1   | 1  | 3.0 | 3.3 | 3.1 | ...                 |
| gr1   | 2  | 2.0 | 2.1 | 1.9 | ...                 |
| gr2   | 1  | 4.9 | 5.1 | 5.0 | ...                 |
| gr2   | 2  | 5.2 | 5.4 | 5.3 | ...                 |
| gr2   | 3  | 5.5 | 5.3 | 5.6 | ...                 |

We do not recommend this format because of its lack of flexibility. In wide format, only one grouping column and one measurement can be passed at a time, this means any new grouping or measurement analysis requires to create a dedicated file.

Maciej Dobrzynski's avatar
Maciej Dobrzynski committed
125 126 127 128 129 130 131
## Unique track IDs

For the analysis, track labels need to be unique across the entire dataset. If the track label column is not unique in the uploaded dataset, there's an option in the UI to create a unique track ID. Check the *Create unique track label* box on and choose grouping columns that will be added to the existing non-unique track label. 

In the example above, the `ID` column is not unique across the dataset (ID=1 is repeated in group `gr1` and `gr2`), therefore the unique track label has to consist of columns `Group` and `ID`. The resulting track label will be `gr1_1`, `gr2_1`, etc.


Jacques Marc-Antoine's avatar
Jacques Marc-Antoine committed
132
## Modules and Functionality
dmattek's avatar
dmattek committed
133

Maciej Dobrzynski's avatar
Maciej Dobrzynski committed
134
The app opens with a default window that allows to plot population averages, individual time series, and power spectral density. 
dmattek's avatar
dmattek committed
135

Maciej Dobrzynski's avatar
Maciej Dobrzynski committed
136
The following features of time series analysis are available in the app:
dmattek's avatar
dmattek committed
137

Maciej Dobrzynski's avatar
Maciej Dobrzynski committed
138 139 140 141 142 143
- Perform simple **math calculations** on an individual variable (inversion 1/X), or on two variables (division, sum, multiplication, subtraction).
- **Trim** the time axis of the data.
- **Normalise** to the average of data points in a selected interval. Time series can be normalised with respect to the entire dataset, a group, or a single time series. The latter would normalise every time course to the mean of its own selected interval.
- **Remove outlier time points** by removing a percentage of data from the top, bottom, or both tails of pooled data points. Gaps in time series due to outlier removal can be then linearly interpolated or tracks can be removed entirely from the set. The UI allows for selecting the size of gaps above which the track is removed.
- **Highlight** individual time series by selecting a unique series identifier.
- Calculate area under individual time series and visualise as a dot-, violin-, or a box-plot. The UI allows for selection of the time series range used for **AUC** calculation.
Maciej Dobrzynski's avatar
Maciej Dobrzynski committed
144
- Display a dot-, violin-, box-, or a line-plot for selected time points.
Jacques Marc-Antoine's avatar
Jacques Marc-Antoine committed
145 146 147 148
- Display a scatter-plot to identify **correlations** between two time points.
- Calculate the **power spectral density (PSD)** using smoothed periodogram or autoregressive fit. Both estimations rely on the R's built-in implementation \texttt{spectrum}. PSD plots can be visualized in the frequency or period domain and independently for each time-series groups. Axis can be transformed with common functions (log, inverse...) to facilitate the identification of spectral patterns.
- Perform **hierarchical and sparse-hierarchical clustering** of individual time series. In these modules, the dendrogram can be cut at a chosen level to help visualising clusters. Addiitonally available are plots with cluster averages, individual times series per cluster, and contribution of time series from different groupings to clusters.
- Perform **cluster validation**. In this module both relative and internal validations are available. Relative validation with a sweep through a range of possible cluster numbers and a report of average silhouette width and within cluster sum of squares. Internal cluster validation, for a fixed number of clusters return 3 visualizations: a dendrogram colored according to the cut, the silhouette plot and a visualization of the clusters on the first 2 principal components. This analysis relies on the implementation in the R package \texttt{factoextra}.