Unverified Commit 1be1e264 authored by Sebastian Schüpbach's avatar Sebastian Schüpbach
Browse files

change markup for h2 elements

parent 90546d2a
......@@ -10,6 +10,7 @@ Die Wissensdatenbank besteht aus Markdown-Dateien im `docs/`-Verzeichnis, welche
* [x] Titel in Front Matter setzen
* [x] Textuelle Datumsangaben wo vorhanden in Front Matter übertragen
* [x] Titelsyntax anpassen
* [ ] Schlagworte in Front Matter aufnehmen
* [ ] Obsolete und leere Artikel löschen
* [ ] Codeblocks mit Language-Tags versehen
......@@ -5,8 +5,8 @@ date = '2015-03-31'
# Redland API
Installation Raptor RDF Syntax Library
--------------------------------------
## Installation Raptor RDF Syntax Library
<http://librdf.org/raptor/>
Zusätzlich benötigte Pakete:
......@@ -21,8 +21,8 @@ Zusätzlich benötigte Pakete:
sudo make install
Installation Rasqal RDF Query Library
-------------------------------------
## Installation Rasqal RDF Query Library
<http://librdf.org/rasqal/>
./configure
......@@ -30,8 +30,8 @@ Installation Rasqal RDF Query Library
sudo make install
Installation Redland RDF API
----------------------------
## Installation Redland RDF API
<http://librdf.org/>
./configure
......@@ -39,8 +39,8 @@ Installation Redland RDF API
sudo make install
Installation Redland RDF Language Bindings
------------------------------------------
## Installation Redland RDF Language Bindings
<http://librdf.org/bindings/>
Zusätzlich benötigte Pakete:
......
......@@ -5,9 +5,7 @@ date = '2015-05-31'
# Big Data
Applikationen
-------------
## Applikationen
* [Apache Bigtop](https://bigtop.apache.org/)
* [Apache Crunch](https://crunch.apache.org/)
......
......@@ -3,8 +3,9 @@ title = 'Clean Code'
+++
# Clean Code
Names++
-------
## Names++
Naming conventions:
1. Choose your names thoughtfully
......@@ -26,8 +27,7 @@ Naming conventions:
4. Functions / classes in a large scope (e.g. public scope): Short names (willl probably be used by other people)
TDD
---
## TDD
### Three Laws of TDD
......
......@@ -3,9 +3,8 @@ title = 'Extreme Programming'
+++
# Extreme Programming
Pair Programming
----------------
* [On Pair Programming](https://martinfowler.com/articles/on-pair-programming.html)
## Pair Programming
* [On Pair Programming](https://martinfowler.com/articles/on-pair-programming.html)
......@@ -5,8 +5,8 @@ weight = 1
# Vorlesung 2. November 2015
01 - Limits of the Web
----------------------
## 01 - Limits of the Web
Verschiedene Epochen des "Internets":
* Computer centered processing (- ~1990): Keine GUI, nur Terminal
......@@ -21,8 +21,8 @@ Verschiedene Epochen des "Internets":
* Web gedacht als riesige dezentralisierte Datenbank (knowledge base) mit maschinenlesbaren Informationen
02 - The Importance of Meaning
------------------------------
## 02 - The Importance of Meaning
"Building blocks" von Bedeutung:
* Syntax: ("**The definition of normative structure of data.**")
......@@ -34,8 +34,8 @@ Verschiedene Epochen des "Internets":
* Erfahrung
03 - Understanding Content on the Web
-------------------------------------
## 03 - Understanding Content on the Web
Zwei Wege, um Webinhalte maschinenlesbar zu machen:
* Implizit: NLP
......@@ -55,25 +55,23 @@ Im Web of Data ist es möglich,
**"The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation."**
(Tim Berners-Lee, James Hendler, Ora Lassila: The Semantic Web, Scientific American, 284(5), S. 24-43 (2001))
04 - Semantic Web Technology and the Web of Data
------------------------------------------------
## 04 - Semantic Web Technology and the Web of Data
## 05 - How to Use the Web of Data
05 - How to Use the Web of Data
-------------------------------
Definition Linked Open Data: "LOD denote publicly available (RDF) Data in the Web, identified via URI and accessable via HTTP. Linked data link to other data via URI."
Definition Semantische Entität: Ein Objekt oder DIng mit einer gegebenen expliziten Bedeutung.
06 - How to Name Things - URIs
------------------------------
## 06 - How to Name Things - URIs
Definition Uniform Resource Identifiers: "URI defines a simple and extensible schema for worldwide unique identification of abstract physical resources (RFC 3986)."
Repräsentation (Webpage) definiert die Präsentation der Informationen. In diesem Fall ist die Webpage der Designator, das Ding, welche sie beschreibt, das Designierte.
Die URI kann je nach Kontext das Designierte oder der Designator repräsentieren: Im Rahmen einer Content Negotation wird bspw. eine Anfrage an DBPedia über den Eiffelturm gemacht, indem DBPedia der Identifier (der Name) des Eiffelturms geschickt wird. Hier repräsentiert die URI das Designierte, also der Eiffelturm. Zurückgegeben wird unter anderem die URI der DBPedia-Seite über den Eiffelturm. Diese URI wiederum repräsentiert hier den Designator, die HTML-Repräsentation der Informationen über den Eiffelturm.
Unter derselben URI können verschiedene Repräsentationen von Informationen zu finden sein, deren Retrieval wiederum abhängig ist von der Content Negotiation: Bspw. Informationen als html oder rdf-xml repräsentiert.
07 - How to Represent Simple Facts with RDF
-------------------------------------------
## 07 - How to Represent Simple Facts with RDF
**RDF:**
* **Resource**:
......@@ -105,8 +103,7 @@ Literals können durch
* deren Sprache durch ein *Language tag* (@<lang>) bezeichnet werden.
08 - RDF and Turtle Serialization
---------------------------------
## 08 - RDF and Turtle Serialization
Präfix-Definition:
``@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .``
......
......@@ -5,8 +5,8 @@ weight = 2
# Vorlesung 9. November 2015
01 - RDF Reification
--------------------
## 01 - RDF Reification
Reifikation erlaubt es in RDF, Aussagen über Aussagen zu treffen. Dazu hilft rdf:Statement, das seinerseits rdf:subject, rdf:predicate und rdf:object enthält.
Reifikation kann dazu verwendet werden, um
......@@ -15,8 +15,8 @@ Reifikation kann dazu verwendet werden, um
* Metadaten über Aussagen zu machen
02 - Model Building with RDFS
-----------------------------
## 02 - Model Building with RDFS
RDFS; RDF Vocabulary Description Language
RDFS macht es u.a. möglich,
......@@ -52,15 +52,14 @@ Weitere Eigenschaften:
Normalerweise wird in einer RDFS-Serialisierung zuerst die Klassendefinitionen, dann die Eigenschaftsdefinitionen und schlussendlich die Instanzendefinitionen festgehalten.
03 - Logical Inference with RDFS
--------------------------------
## 03 - Logical Inference with RDFS
* rdfs:domain definiert die Domäne, zu der eine bestimmte rdfs:Property gehören muss. D.h. also, ein **Subjekt** ist immer ein Typ der von der Eigenschaft definierten Domäne.
* rdfs:range definiert den Umfang, auf den sich eine bestimte rdfs:Property bezieht. D.h. also, ein **Objekt** ist immer ein Typ des von der Eigeschaft definierten Umfangs.
04 - How to query RDF(S)? SPARQL
--------------------------------
## 04 - How to query RDF(S)? SPARQL
SPARQL (SPARQL Protocol and RDF Query Language) is
* a **Query Language** for RDF graph traversal (SPARQL Query Language Specification)
......@@ -81,16 +80,15 @@ FILTER contstraints:
* FILTER can not assign/create new values
05 - SPARQL is more than a Query Language
-----------------------------------------
## 05 - SPARQL is more than a Query Language
* ASK (im Gegensatz zu SELECT) liefert ein Boolescher Wert, ob eine Anfrage Treffer hat oder nicht
* DESCRIBE liefert Informationen über eine Resource zurück
* CONSTRUCT erstellt ein neuer RDF graph gemäss einer Vorlage (Bsp.: (...) ``CONSTRUCT { ?author <http://example.org/hasWritten> ?work . }`` (...))
06 - Complex queries with SPARQL
--------------------------------
## 06 - Complex queries with SPARQL
Einige weitere Operatoren:
* FILTER REGEX (<variable>, <regular expression>, <regex-flags>)
......@@ -104,8 +102,7 @@ Einige weitere Operatoren:
Drei-wertige Logik bei logischen Operationen: true, false, error
07 - More complex SPARQL queries
--------------------------------
## 07 - More complex SPARQL queries
* Agreggate Functions:
* SELECT **COUNT**(<variable>) AS <neue_variable>
......@@ -114,8 +111,8 @@ Drei-wertige Logik bei logischen Operationen: true, false, error
* Mit HAVING können Aggregationen gefilter werden
08 - SPARQL Subqueries and Property Paths
-----------------------------------------
## 08 - SPARQL Subqueries and Property Paths
Property paths: Eine möglicher Weg durch einen RDF-Graph zwischen zwei Knoten. Sie können u.a. als Alternative zu Subqueries gesehen werden:
* **Alternatives**: Eine oder mehrere Möglichkeiten treffen zu (Bsp.: ``{:book dc:title|rdfs:label ?displayString }``)
......
......@@ -5,8 +5,8 @@ weight = 3
# Vorlesung 16. November 2015
01 - Ontology in Philosophy and Computer Science
------------------------------------------------
## 01 - Ontology in Philosophy and Computer Science
"An ontology is an **explict, format specification of a shared conceptualization**. The term is borrowed from philosophy, where an Ontology is a systematic account of existence. For AI systems, what 'esists' is that which can be represented." (Thomas R. Gruber)
* Conceptualization: Abstract model (domain, identified relevant concepts, relations)
......@@ -24,8 +24,8 @@ Terminological Knowledge: Drückt Relationen / Attribute zwischen Klassen aus
Assertional Knowledge: Drückt Relationen / Attribute zwischen Instanzen von Klassen aus
02 - Ontology Types
-------------------
## 02 - Ontology Types
Verschiedene Typen von Ontologien:
* Top-level Ontology: Fundamentale Ontologien
......@@ -50,8 +50,8 @@ Andere Kategorisierungen:
* Disjunctiveness, inversiveness, Part-of
03 - The Foundations of Logic
-----------------------------
## 03 - The Foundations of Logic
Arbeitsdefinition: **Logic is the study of how to make formal correct deductions and inferences**
......@@ -63,8 +63,8 @@ Arbeitsdefinition: **Logic is the study of how to make formal correct deductions
Wie kann die Semantik einer mathematischen Logik ausgedrückt werden? Durch Modelle (eine Modelltheorie), welche eine Analogie zu der Welt sein soll, die modelliert wird.
=> Modelltheoretische Semantik: Führt die semantische Interpretation von natürlicher oder künstlicher Sprache durch die Identifizierung von Bedeutung mit einer exakten und formal definierten Interpretation gemäss einem Model durch. (= Formale Interpretation mit einem Modell).
04 - Short Recapitulation of Propositional Logic
------------------------------------------------
## 04 - Short Recapitulation of Propositional Logic
A formula F is
* **tautological** if all interpretations I are true
......@@ -73,33 +73,24 @@ A formula F is
* **not satisfiable** if no interpretation I is true
05 - Short Recapitulation of First Order Logic
----------------------------------------------
## 05 - Short Recapitulation of First Order Logic
Theory = Set of Formulas
Theory = Knowledge Base
Da First-Order-Logic nur halb-entscheidbar ist (wir können in finiter Zeit beweisen, dass eine Theorie T eine Formel F impliziert, aber nicht, dass T F nicht impliziert), ist sie für ein automatische Beweisführung nicht gänzlich brauchbar.
06 - How to mechanize Reasoning - Tableaux Algorithm
----------------------------------------------------
## 06 - How to mechanize Reasoning - Tableaux Algorithm
For the logical calculus, there must be proven:
* Correctness: Every syntactic entailment is also a semantic entailment
* Completeness: All semantic entailments are also syntactic entailments
## 07 - Description Logics
07 - Description Logics
-----------------------
08 - Inference and Reasoning
----------------------------
09 - DLs and the Open World Assumption
--------------------------------------
## 08 - Inference and Reasoning
10 - Tableaux Algorithm for ALC
-------------------------------
## 09 - DLs and the Open World Assumption
## 10 - Tableaux Algorithm for ALC
......@@ -3,8 +3,9 @@ title = 'Anaconda'
+++
# Anaconda
Umgebung erstellen
------------------
## Umgebung erstellen
sudo conda create --name <envname> <zu_installierende_pakete>
Anconda erstellt eine neue *environment* unter ``/opt/anaconda3/envs/<envname>`` erstellt. Anschliessend kann die Umgebung folgendermassen in der Kommandozeile aktiviert werden:
......
......@@ -20,8 +20,8 @@ Application state is stored either locally in memory or in an embedded database.
Since Flink is a distributed system, local state needs to be protected against failures to avoid data loss in case of application or machine failure. Guaranteed by periodically writing a consistent checkpoint of application state to persistent storage.
![](./Apache_Flink/checkpointing.png)
Key concepts
------------
## Key concepts
**Dataflow program**: Describes how data flows between operations; commonly represented as directed graphs
Nodes in a dataflow program are called **operators** and represent computations; basic functional units of dataflow applications
Data source: Operator without input port
......@@ -47,8 +47,8 @@ Data exchange strategies (automatically chosen by the execution engine or manual
**Latency**: Indicates how long it takes for an event to be processed
**Throughput**: Processing capacity (how many events are processed in a unit of time?). Usually the goal is to ensure that the system can handle the maximum expected rate of events, otherwise **backpressure** happens
Window Operations
-----------------
## Window Operations
* Window operations continuously create finite sets of events called **buckets** fom an unbounded event stream and let us perform computations on these finite sets.
* Window policies: When are new buckets created? Which events go to which buckets (usually based on data properties, on counts, or on time)? When do the contents of a bucket get evaluated (trigger condition)?
......@@ -59,8 +59,8 @@ Window Operations
* Usually windows are run in parallel. In parallel windows, each partition applies the window policies independently of other partitions
Time Semantics
--------------
## Time Semantics
Advantages of **processing time**:
* Introduces lowest latency possible
......@@ -85,8 +85,8 @@ Advantages of **event time**:
* **Allowed lateness**: Instead of always delaying the firing of a window, allowed lateness retriggers the firing of the window if a delayed event within the allowed timespan arrives. Of course the action should be idempotent to guarantee exactly-once semantics.
State and Consistency Models
----------------------------
## State and Consistency Models
Challenges:
* State management: The system needs to efficiently manage the state and make sure it is protected from concurrent updates
......@@ -132,14 +132,14 @@ Primitives in Operator State:
* Broadcast state: Designed for the special case where the state of each task of an operator is identical. This property can be leveraged during checkpoints and when rescaling an operator.
Streams API
-----------
## Streams API
### Operators
![](./Apache_Flink/flink_streams_api_operators.png)
Architecture of Flink
---------------------
## Architecture of Flink
### Flink Cluster Components
......@@ -166,8 +166,8 @@ Architecture of Flink
* Depending on how an application is submitted for execution, a dispatcher might not be required (applications are then delivered directly to the JobManager which executes them immediately)
Setup & Deployment
------------------
## Setup & Deployment
### Deployment Modes
......@@ -185,12 +185,12 @@ Setup & Deployment
* [Flink Docker Image on Github](https://github.com/docker-flink/docker-flink)
Hinweise zur Beispielanwendung
------------------------------
## Hinweise zur Beispielanwendung
Damit die [Beispielanwendung](https://ci.apache.org/projects/flink/flink-docs-release-1.3/quickstart/setup_quickstart.html) funktioniert, muss die Hostname-Auflösung auf dem Gastsystem korrekt funktionieren. S. für Arch Linux dazu <https://wiki.archlinux.org/index.php/Hostname#Set_the_hostname>.
Resources
---------
## Resources
* [How To Size Your Apache Flink Cluster: A Back-of-the-Envelope Calculation](https://www.ververica.com/blog/how-to-size-your-apache-flink-cluster-general-guidelines)
* [4 Ways to Optimize Your Flink Applications](https://dzone.com/articles/four-ways-to-optimize-your-flink-applications)
......
......@@ -3,8 +3,8 @@ title = 'Apache Hadoop'
+++
# Apache Hadoop
Installation
------------
## Installation
* Hinweise zur Installation s. <https://wiki.archlinux.org/index.php/Hadoop>.
* Für die nachfolgende Konfiguration des Clusters s. Hinweise im Buch Hadoop: The Definitive Guide, Anhang A: Installing Apache Hadoop, Configuration: Pseudodistributed Mode
......
......@@ -5,8 +5,7 @@ title = 'Kafka Connect'
# Kafka Connect
Framework, um Kafka mit externen Systemen wie Datenbanken, Key-Value Stores, Suchindizes oder Dateisysteme zu verbinden.
Core concepts
-------------
## Core concepts
* **Connectors**: The high level abstraction that coordinates data streaming by managing tasks
* **Connector instance**: Logical job that is responsible for managing the copying of data between Kafka and another system
......@@ -22,8 +21,8 @@ Core concepts
Weitere Ressourcen
------------------
## Weitere Ressourcen
<https://kafka.apache.org/documentation.html#connect>
<https://docs.confluent.io/current/connect/index.html>
<https://kafka.apache.org/21/javadoc/index.html?org/apache/kafka/connect>
......
......@@ -18,8 +18,7 @@ title = 'Kafka Streams'
* **It is not recommended to write the result to an external system inside a Kafka Streams application**. Use Kafka Streams to transform the data and then use Kafka Connect API to do the writing
Basic vocabulary
----------------
## Basic vocabulary
* **Stream**: Unbounded sequence of immutable data records, that is fully ordered, can be replayed, and is fault tolerant
* **Stream processor**: Node in the processor topology. It transforms incoming streams, record by record, and may create a new stream from it
......@@ -28,8 +27,7 @@ Basic vocabulary
* **Topology**: Graph of processors chained together by streams
Exactly Once Semantics
----------------------
## Exactly Once Semantics
* Exactly once isthe ability to guarantee that data processing on each message will happen only once, and that pushing the message back to Kafka will also happen effectively only once (Kafka will de-dup). So the guarantee does not extend to exactly once *delivery*.
* Guaranteed when both input and output system is Kafka, not for Kafka to any external systems
......@@ -47,15 +45,13 @@ Exactly Once Semantics
* You fine tune that setting using ``commit.interval.ms``
KStream and KTables Duality
---------------------------
## KStream and KTables Duality
* **Stream as Table**: A stream can be considered a changelog of a table, where each data record in the stream captures a state change of the table. Two ways to create: ``groupByKey()`` + aggregation (``count``, ``aggregate``, ``reduce``) or write back to Kafka and read as KTable
* **Table as Stream**: A table can be considered a snapshot, at a point in time, of the latest value for each key in a stream (a stream's data records are key-value pairs) (``toStream()``)
Internal Topics
---------------
## Internal Topics
* Running a Kafka Streams may eventually create internal intermediary topics
* Two types:
......@@ -68,8 +64,8 @@ Internal Topics
* Never add to / delete them!
Application setup
-----------------
## Application setup
Basic dependencies:
* ``org.apache.kafka:kafka-streams``
......@@ -140,8 +136,8 @@ Basic dependencies:
* a regex that can match one or more topics
KStream and KTable Simple Operations
------------------------------------
## KStream and KTable Simple Operations
Documentation: <https://docs.confluent.io/current/streams/developer-guide.html#transform-a-stream>
### KStreams vs. KTables vs. GlobalKTables
......@@ -359,8 +355,8 @@ The first three joins can only happy if the data is **co-partitioned**:
* Repartitioning is done seamlessly behind the scenes but will incur a performance cost (read and write to Kafka)
Processor API
-------------
## Processor API
Besides the Stream DSL, there is also the low-level Processor API. It can be used on its own or leveraged from the Stream DSL.
### From the Stream DSL
......@@ -385,8 +381,8 @@ Marks the stream for data re-partitioning
### Standalone
Testing
-------
## Testing
<http://kafka.apache.org/21/documentation/streams/developer-guide/testing.html>
* Tests the Topology object of Kafka Streams application
......@@ -396,8 +392,8 @@ Testing
* Producer Record Reader + Tests
Error catching
--------------
## Error catching
* To catch any unexpected exceptions, you can set an ``java.lang.Thread.UncaughtExceptionHandler`` before you start the application. This handler is called whenever a stream thread is terminated by an unexpected exception:
......@@ -406,8 +402,8 @@ Error catching
}
Resources
---------
## Resources
* [Interactive Queries in Apache Kafka](https://blog.knoldus.com/interactive-queries-apache-kafka/)
* [Interactive Queries in Apache Kafka Streams](https://blog.codecentric.de/en/2017/03/interactive-queries-in-apache-kafka-streams/)
......
......@@ -4,8 +4,8 @@ title = 'Apache Kafka'
+++
# Apache Kafka
Preliminaries
-------------
## Preliminaries
### Reasons
......@@ -29,8 +29,7 @@ Preliminaries
### Typical Architecture
![](./Apache_Kafka/typical_architecture.png)
Topics and partitions
---------------------
## Topics and partitions
* Topics: Particular stream of data
* Similar to a table in a database
......@@ -135,8 +134,7 @@ Topics and partitions
* Compression only makes sense if non-binary data is sent
Brokers
-------
## Brokers
* A Kafka cluster is composed of multiple servers (*brokers*)
* Each broker is identified with its ID (integer)
......@@ -149,8 +147,7 @@ Brokers
* The other brokers will synchronize the data (**ISR**: *in-sync replica*)
Producers
---------
## Producers
* Producers write data to topics
* They only have to specify the topic name and one broker to connect to, and Kafka will automatically take care of routing the data to the right brokers
......@@ -173,8 +170,7 @@ Producers
* ``linger.ms`` (If messages should be sent to Kafka in a certain interval)
Consumers
---------
## Consumers
* Consumers read data from a topic
* Required: Topic name and one broker
......@@ -199,8 +195,7 @@ Consumers
* ``auto.offset.reset``: What to do if no offsets are available (``latest``, ``earliest``, ``none`` (throws exception))
Zookeeper