README.md 7.63 KB
Newer Older
dmattek's avatar
dmattek committed
1 2
# Time-course analysis web-app

Jacques Marc-Antoine's avatar
Jacques Marc-Antoine committed
3 4 5 6 7 8 9 10 11
- [Time-course analysis web-app](#time-course-analysis-web-app)
  * [Running the app](#running-the-app)
    + [Runnning instance](#runnning-instance)
    + [Running the app on the server](#running-the-app-on-the-server)
    + [Running the app locally](#running-the-app-locally)
  * [Input file](#input-file)
      - [Long format](#long-format)
      - [Wide format](#wide-format)
  * [Unique track IDs](#unique-track-ids)
Jacques Marc-Antoine's avatar
Jacques Marc-Antoine committed
12
  * [Modules and Functionality](#modules-and-functionality)
Jacques Marc-Antoine's avatar
Jacques Marc-Antoine committed
13 14 15 16


## Running the app
### Running instance
Maciej Dobrzynski's avatar
Maciej Dobrzynski committed
17 18
Access the running instance of the app on [shinyapps.io](https://macdobry.shinyapps.io/tcourse-inspector/ "TimeCourse Inspector")

Jacques Marc-Antoine's avatar
Jacques Marc-Antoine committed
19
### Running the app on the server
Maciej Dobrzynski's avatar
Maciej Dobrzynski committed
20
The app can be deployed on RStudio/Shiny server. Follow instruction [here](https://shiny.rstudio.com/deploy/ "Shiny - Hosting").
dmattek's avatar
dmattek committed
21

Jacques Marc-Antoine's avatar
Jacques Marc-Antoine committed
22
### Running the app locally
Maciej Dobrzynski's avatar
Maciej Dobrzynski committed
23
Alternatively, after downloading the code, the app can run within RStudio. Open `server.R` or `ui.R` file, then click "Run App" button with green triangle in the upper right corner of the window with code open.
dmattek's avatar
dmattek committed
24 25 26 27 28

Following packages need to be installed in order to run the app locally:

* shiny
* shinyjs
dmattek's avatar
dmattek committed
29
* shinyBS
dmattek's avatar
dmattek committed
30
* shinycssloaders
dmattek's avatar
dmattek committed
31
* data.table
Maciej Dobrzynski's avatar
Maciej Dobrzynski committed
32
* DT
dmattek's avatar
dmattek committed
33 34 35
* ggplot2
* gplots
* plotly
Maciej Dobrzynski's avatar
Maciej Dobrzynski committed
36 37
* scales
* grid
dmattek's avatar
dmattek committed
38 39
* dendextend
* RColorBrewer
dmattek's avatar
dmattek committed
40
* ggthemes
dmattek's avatar
dmattek committed
41
* sparcl
dmattek's avatar
dmattek committed
42
* dtw
dmattek's avatar
dmattek committed
43
* factoextra
Maciej Dobrzynski's avatar
Maciej Dobrzynski committed
44
* imputeTS
45 46 47
* MASS
* robust
* pracma
48
* Hmisc
dmattek's avatar
dmattek committed
49 50 51

Install packages using `install.packages('name_of_the_package_from_the_list_above')` command in RStudio command line.

dmattek's avatar
dmattek committed
52
```
dmattek's avatar
dmattek committed
53
install.packages(c("shiny", "shinyjs", "shinyBS", "shinycssloaders",
54
					"data.table", "DT",
Maciej Dobrzynski's avatar
Maciej Dobrzynski committed
55
					"ggplot2", "gplots", "plotly", "scales", "grid",
dmattek's avatar
dmattek committed
56
					"dendextend", "RColorBrewer", "ggthemes",
dmattek's avatar
dmattek committed
57
					"sparcl", "dtw", "factoextra",
Maciej Dobrzynski's avatar
Maciej Dobrzynski committed
58
					"imputeTS", "MASS", "robust", "pracma", "Hmisc")) 
59 60
```

Jacques Marc-Antoine's avatar
Jacques Marc-Antoine committed
61 62 63 64 65 66 67
### Running the app locally with a temporary copy
Running the 2 following lines should get you started immediatly with a temporary copy of the app:
```
library(shiny)
runGitHub("dmattek/shiny-timecourse-inspector")
```

dmattek's avatar
dmattek committed
68
## Input file
dmattek's avatar
dmattek committed
69

Jacques Marc-Antoine's avatar
Jacques Marc-Antoine committed
70
The app recognizes CSV (comma-separated values) files where data columns are separated by a comma and floating point numbers use a dot (full-stop). Compressed CSV files in zip or bz2 format can be uploaded directly without decompression. Both long and wide data formats are accepted but we highly recommend using the long format because it allows for multiple groupings and multivariate measurements.
dmattek's avatar
dmattek committed
71

Jacques Marc-Antoine's avatar
Jacques Marc-Antoine committed
72 73
#### Long format
In the long format, the first row should include column headers. The input CSV file should contain at least these three columns:
Maciej Dobrzynski's avatar
Maciej Dobrzynski committed
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89

* Identifier of a time series, i.e. a track label
* Time points
* Time-varying variable

| ID | Time | Meas1 |
|----|------|-------|
| 1  |  1   | 3.3   |
| 1  |  2   | 2.1   |
| 1  |  4   | 4.3   |
|----|------|-------|
| 2  |  1   | 2.8   |
| 2  |  2   | 1.9   |
| 2  |  3   | 1.7   |
| 2  |  4   | 2.2   |

Jacques Marc-Antoine's avatar
Jacques Marc-Antoine committed
90
In case of multivariate time series, additional columns with variables can be added in the input. Then, the GUI allows for choosing a single or a combination of two variables to display.
Maciej Dobrzynski's avatar
Maciej Dobrzynski committed
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110

Time series can be grouped by introducing a grouping column:

| Group | ID | Time | Meas1 |
|-------|----|------|-------|
| gr1   | 1  |  1   | 3.3   |
| gr1   | 1  |  2   | 2.1   |
| gr1   | 1  |  4   | 4.3   |
|-------|----|------|-------|
| gr1   | 2  |  1   | 2.8   |
| gr1   | 2  |  2   | 1.9   |
| gr1   | 2  |  3   | 1.7   |
| gr1   | 2  |  4   | 2.2   |
|-------|----|------|-------|
| gr2   | 1  |  1   | 5.1   |
| gr2   | 1  |  2   | 5.4   |
| gr2   | 1  |  3   | 5.3   |

Introduction of grouping allows for the analysis and displaying data per group.

Jacques Marc-Antoine's avatar
Jacques Marc-Antoine committed
111 112 113 114 115 116 117 118 119 120 121 122 123
#### Wide format
In wide format, entire univariate time series are stored as rows, with columns treated as time points. The first two columns should contain a grouping and the identifier of time series.

| Group | ID | 0   | 1   | 2   | further time points |
|-------|----|-----|-----|-----|---------------------|
| gr1   | 1  | 3.0 | 3.3 | 3.1 | ...                 |
| gr1   | 2  | 2.0 | 2.1 | 1.9 | ...                 |
| gr2   | 1  | 4.9 | 5.1 | 5.0 | ...                 |
| gr2   | 2  | 5.2 | 5.4 | 5.3 | ...                 |
| gr2   | 3  | 5.5 | 5.3 | 5.6 | ...                 |

We do not recommend this format because of its lack of flexibility. In wide format, only one grouping column and one measurement can be passed at a time, this means any new grouping or measurement analysis requires to create a dedicated file.

Maciej Dobrzynski's avatar
Maciej Dobrzynski committed
124 125 126 127 128 129 130
## Unique track IDs

For the analysis, track labels need to be unique across the entire dataset. If the track label column is not unique in the uploaded dataset, there's an option in the UI to create a unique track ID. Check the *Create unique track label* box on and choose grouping columns that will be added to the existing non-unique track label. 

In the example above, the `ID` column is not unique across the dataset (ID=1 is repeated in group `gr1` and `gr2`), therefore the unique track label has to consist of columns `Group` and `ID`. The resulting track label will be `gr1_1`, `gr2_1`, etc.


Jacques Marc-Antoine's avatar
Jacques Marc-Antoine committed
131
## Modules and Functionality
dmattek's avatar
dmattek committed
132

Maciej Dobrzynski's avatar
Maciej Dobrzynski committed
133
The app opens with a default window that allows to plot population averages, individual time series, and power spectral density. 
dmattek's avatar
dmattek committed
134

Maciej Dobrzynski's avatar
Maciej Dobrzynski committed
135
The following features of time series analysis are available in the app:
dmattek's avatar
dmattek committed
136

Maciej Dobrzynski's avatar
Maciej Dobrzynski committed
137 138 139 140 141 142
- Perform simple **math calculations** on an individual variable (inversion 1/X), or on two variables (division, sum, multiplication, subtraction).
- **Trim** the time axis of the data.
- **Normalise** to the average of data points in a selected interval. Time series can be normalised with respect to the entire dataset, a group, or a single time series. The latter would normalise every time course to the mean of its own selected interval.
- **Remove outlier time points** by removing a percentage of data from the top, bottom, or both tails of pooled data points. Gaps in time series due to outlier removal can be then linearly interpolated or tracks can be removed entirely from the set. The UI allows for selecting the size of gaps above which the track is removed.
- **Highlight** individual time series by selecting a unique series identifier.
- Calculate area under individual time series and visualise as a dot-, violin-, or a box-plot. The UI allows for selection of the time series range used for **AUC** calculation.
Maciej Dobrzynski's avatar
Maciej Dobrzynski committed
143
- Display a dot-, violin-, box-, or a line-plot for selected time points.
Jacques Marc-Antoine's avatar
Jacques Marc-Antoine committed
144 145 146 147
- Display a scatter-plot to identify **correlations** between two time points.
- Calculate the **power spectral density (PSD)** using smoothed periodogram or autoregressive fit. Both estimations rely on the R's built-in implementation \texttt{spectrum}. PSD plots can be visualized in the frequency or period domain and independently for each time-series groups. Axis can be transformed with common functions (log, inverse...) to facilitate the identification of spectral patterns.
- Perform **hierarchical and sparse-hierarchical clustering** of individual time series. In these modules, the dendrogram can be cut at a chosen level to help visualising clusters. Addiitonally available are plots with cluster averages, individual times series per cluster, and contribution of time series from different groupings to clusters.
- Perform **cluster validation**. In this module both relative and internal validations are available. Relative validation with a sweep through a range of possible cluster numbers and a report of average silhouette width and within cluster sum of squares. Internal cluster validation, for a fixed number of clusters return 3 visualizations: a dendrogram colored according to the cut, the silhouette plot and a visualization of the clusters on the first 2 principal components. This analysis relies on the implementation in the R package \texttt{factoextra}.