Version bump: 2.5.0

pep8
Rework user hack for "login-free" sessions #394
2025-12-22 11:01:23 +00:00 · 2018-10-07 16:30:36 +01:00 · 2018-10-07 16:30:02 +01:00 · 2018-10-07 16:27:41 +01:00 · 2018-10-07 16:26:05 +01:00 · 2018-10-07 16:25:51 +01:00
19 changed files with 329 additions and 124 deletions
--- a/.travis.yml
+++ b/.travis.yml
@@ -2,7 +2,7 @@ language: python

 before_install:
 - sudo apt-get update -qq
- sudo apt-get install -qq libpoppler-cpp-dev unpaper tesseract-ocr tesseract-ocr-eng
+- sudo apt-get install -qq libpoppler-cpp-dev unpaper tesseract-ocr tesseract-ocr-eng tesseract-ocr-cat

 sudo: false

--- a/11
+++ b/11
@@ -1,4 +1,4 @@
-FROM alpine:3.7
+FROM alpine:3.8

 LABEL maintainer="The Paperless Project https://github.com/danielquinn/paperless" \
      contributors="Guy Addadi <addadi@gmail.com>, Pit Kleyersburg <pitkley@googlemail.com>, \
@@ -12,11 +12,10 @@ COPY scripts/docker-entrypoint.sh /sbin/docker-entrypoint.sh
 ENV PAPERLESS_EXPORT_DIR=/export \
    PAPERLESS_CONSUMPTION_DIR=/consume

-# Install dependencies
-RUN apk --no-cache --update add \
-        python3 gnupg libmagic bash shadow curl \
-        sudo poppler tesseract-ocr imagemagick ghostscript unpaper && \
-    apk --no-cache add --virtual .build-dependencies \
+
+RUN apk update --no-cache && apk add python3 gnupg libmagic bash shadow curl \
+        sudo poppler tesseract-ocr imagemagick ghostscript unpaper optipng && \
+    apk add --virtual .build-dependencies \
        python3-dev poppler-dev gcc g++ musl-dev zlib-dev jpeg-dev && \
 # Install python dependencies
    python3 -m ensurepip && \
--- a/docs/changelog.rst
+++ b/docs/changelog.rst
@@ -1,6 +1,49 @@
 Changelog
 #########

+2.5.0
+=====
+
+* **New dependency**: Paperless now optimises thumbnail generation with
+  `optipng`_, so you'll need to install that somewhere in your PATH or declare
+  its location in ``PAPERLESS_OPTIPNG_BINARY``.  The Docker image has already
+  been updated on the Docker Hub, so you just need to pull the latest one from
+  there if you're a Docker user.
+
+* "Login free" instances of Paperless were breaking whenever you tried to edit
+  objects in the admin: adding/deleting tags or correspondents, or even fixing
+  spelling.  This was due to the "user hack" we were applying to sessions that
+  weren't using a login, as that hack user didn't have a valid id.  The fix was
+  to attribute the first user id in the system to this hack user.  `#394`_
+
+* A problem in how we handle slug values on Tags and Correspondents required a
+  few changes to how we handle this field `#393`_:
+
+  1. Slugs are no longer editable.  They're derived from the name of the tag or
+     correspondent at save time, so if you wanna change the slug, you have to
+     change the name, and even then you're restricted to the rules of the
+     ``slugify()`` function.  The slug value is still visible in the admin
+     though.
+  2. I've added a migration to go over all existing tags & correspondents and
+     rewrite the ``.slug`` values to ones conforming to the ``slugify()``
+     rules.
+  3. The consumption process now uses the same rules as ``.save()`` in
+     determining a slug and using that to check for an existing
+     tag/correspondent.
+
+* An annoying bug in the date capture code was causing some bogus dates to be
+  attached to documents, which in turn busted the UI.  Thanks to `Andrew Peng`_
+  for reporting this. `#414`_.
+
+* A bug in the Dockerfile meant that Tesseract language files weren't being
+  installed correctly.  `euri10`_ was quick to provide a fix: `#406`_, `#413`_.
+
+* Document consumption is now wrapped in a transaction as per an old ticket
+  `#262`_.
+
+* The ``get_date()`` functionality of the parsers has been consolidated onto
+  the ``DocumentParser`` class since much of that code was redundant anyway.
+
 2.4.0
 =====

@@ -525,6 +568,8 @@ bulk of the work on this big change.
 .. _ahyear: https://github.com/ahyear
 .. _jonaswinkler: https://github.com/jonaswinkler
 .. _thepill: https://github.com/thepill
+.. _Andrew Peng: https://github.com/pengc99
+.. _euri10: https://github.com/euri10

 .. _#20: https://github.com/danielquinn/paperless/issues/20
 .. _#44: https://github.com/danielquinn/paperless/issues/44
@@ -590,6 +635,7 @@ bulk of the work on this big change.
 .. _#322: https://github.com/danielquinn/paperless/pull/322
 .. _#328: https://github.com/danielquinn/paperless/pull/328
 .. _#253: https://github.com/danielquinn/paperless/issues/253
+.. _#262: https://github.com/danielquinn/paperless/issues/262
 .. _#323: https://github.com/danielquinn/paperless/issues/323
 .. _#344: https://github.com/danielquinn/paperless/pull/344
 .. _#351: https://github.com/danielquinn/paperless/pull/351
@@ -606,13 +652,19 @@ bulk of the work on this big change.
 .. _#391: https://github.com/danielquinn/paperless/pull/391
 .. _#390: https://github.com/danielquinn/paperless/pull/390
 .. _#392: https://github.com/danielquinn/paperless/issues/392
+.. _#393: https://github.com/danielquinn/paperless/issues/393
 .. _#395: https://github.com/danielquinn/paperless/pull/395
+.. _#394: https://github.com/danielquinn/paperless/issues/394
 .. _#396: https://github.com/danielquinn/paperless/pull/396
 .. _#399: https://github.com/danielquinn/paperless/pull/399
 .. _#400: https://github.com/danielquinn/paperless/pull/400
 .. _#401: https://github.com/danielquinn/paperless/pull/401
 .. _#405: https://github.com/danielquinn/paperless/pull/405
+.. _#406: https://github.com/danielquinn/paperless/issues/406
 .. _#412: https://github.com/danielquinn/paperless/issues/412
+.. _#413: https://github.com/danielquinn/paperless/pull/413
+.. _#414: https://github.com/danielquinn/paperless/issues/414

 .. _pipenv: https://docs.pipenv.org/
 .. _a new home on Docker Hub: https://hub.docker.com/r/danielquinn/paperless/
+.. _optipng: http://optipng.sourceforge.net/
--- a/paperless.conf.example
+++ b/paperless.conf.example
@@ -213,3 +213,23 @@ PAPERLESS_DEBUG="false"
 # The number of years for which a correspondent will be included in the recent
 # correspondents filter.
 #PAPERLESS_RECENT_CORRESPONDENT_YEARS=1
+
+###############################################################################
+####                     Third-Party Binaries                              ####
+###############################################################################
+
+# There are a few external software packages that Paperless expects to find on
+# your system when it starts up.  Unless you've done something creative with
+# their installation, you probably won't need to edit any of these.  However,
+# if you've installed these programs somewhere where simply typing the name of
+# the program doesn't automatically execute it (ie. the program isn't in your
+# $PATH), then you'll need to specify the literal path for that program here.
+
+# Convert (part of the ImageMagick suite)
+#PAPERLESS_CONVERT_BINARY=/usr/bin/convert
+
+# Unpaper
+#PAPERLESS_UNPAPER_BINARY=/usr/bin/unpaper
+
+# Optipng (for optimising thumbnail sizes)
+#PAPERLESS_OPTIPNG_BINARY=/usr/bin/optipng
--- a/src/documents/admin.py
+++ b/src/documents/admin.py
@@ -125,6 +125,8 @@ class CorrespondentAdmin(CommonAdmin):
    list_filter = ("matching_algorithm",)
    list_editable = ("match", "matching_algorithm")

+    readonly_fields = ("slug",)
+
    def get_queryset(self, request):
        qs = super(CorrespondentAdmin, self).get_queryset(request)
        qs = qs.annotate(
@@ -149,6 +151,8 @@ class TagAdmin(CommonAdmin):
    list_filter = ("colour", "matching_algorithm")
    list_editable = ("colour", "match", "matching_algorithm")

+    readonly_fields = ("slug",)
+
    def get_queryset(self, request):
        qs = super(TagAdmin, self).get_queryset(request)
        qs = qs.annotate(document_count=models.Count("documents"))
@@ -167,7 +171,7 @@ class DocumentAdmin(CommonAdmin):
        }

    search_fields = ("correspondent__name", "title", "content", "tags__name")
-    readonly_fields = ("added",)
+    readonly_fields = ("added", "file_type", "storage_type",)
    list_display = ("title", "created", "added", "thumbnail", "correspondent",
                    "tags_")
    list_filter = (
--- a/src/documents/consumer.py
+++ b/src/documents/consumer.py
@@ -1,3 +1,4 @@
+from django.db import transaction
 import datetime
 import hashlib
 import logging
@@ -111,8 +112,11 @@ class Consumer:
                if not self.try_consume_file(file):
                    self._ignore.append((file, mtime))

+    @transaction.atomic
    def try_consume_file(self, file):
-        "Return True if file was consumed"
+        """
+        Return True if file was consumed
+        """

        if not re.match(FileInfo.REGEXES["title"], file):
            return False
@@ -145,7 +149,7 @@ class Consumer:
        parsed_document = parser_class(doc)

        try:
-            thumbnail = parsed_document.get_thumbnail()
+            thumbnail = parsed_document.get_optimised_thumbnail()
            date = parsed_document.get_date()
            document = self._store(
                parsed_document.get_text(),
--- a/src/documents/filters.py
+++ b/src/documents/filters.py
@@ -1,4 +1,4 @@
-from django_filters.rest_framework import CharFilter, FilterSet, BooleanFilter, ModelChoiceFilter
+from django_filters.rest_framework import BooleanFilter, FilterSet

 from .models import Correspondent, Document, Tag

--- a/src/documents/migrations/0022_auto_20181007_1420.py
+++ b/src/documents/migrations/0022_auto_20181007_1420.py
@@ -0,0 +1,52 @@
+# Generated by Django 2.0.8 on 2018-10-07 14:20
+
+from django.db import migrations, models
+from django.utils.text import slugify
+
+
+def re_slug_all_the_things(apps, schema_editor):
+    """
+    Rewrite all slug values to make sure they're actually slugs before we brand
+    them as uneditable.
+    """
+
+    Tag = apps.get_model("documents", "Tag")
+    Correspondent = apps.get_model("documents", "Tag")
+
+    for klass in (Tag, Correspondent):
+        for instance in klass.objects.all():
+            klass.objects.filter(
+                pk=instance.pk
+            ).update(
+                slug=slugify(instance.slug)
+            )
+
+
+class Migration(migrations.Migration):
+
+    dependencies = [
+        ('documents', '0021_document_storage_type'),
+    ]
+
+    operations = [
+        migrations.AlterModelOptions(
+            name='tag',
+            options={'ordering': ('name',)},
+        ),
+        migrations.AlterField(
+            model_name='correspondent',
+            name='slug',
+            field=models.SlugField(blank=True, editable=False),
+        ),
+        migrations.AlterField(
+            model_name='document',
+            name='file_type',
+            field=models.CharField(choices=[('pdf', 'PDF'), ('png', 'PNG'), ('jpg', 'JPG'), ('gif', 'GIF'), ('tiff', 'TIFF'), ('txt', 'TXT'), ('csv', 'CSV'), ('md', 'MD')], editable=False, max_length=4),
+        ),
+        migrations.AlterField(
+            model_name='tag',
+            name='slug',
+            field=models.SlugField(blank=True, editable=False),
+        ),
+        migrations.RunPython(re_slug_all_the_things, migrations.RunPython.noop)
+    ]
--- a/src/documents/models.py
+++ b/src/documents/models.py
@@ -11,6 +11,7 @@ from django.conf import settings
 from django.db import models
 from django.template.defaultfilters import slugify
 from django.utils import timezone
+from django.utils.text import slugify
 from fuzzywuzzy import fuzz

 from .managers import LogManager
@@ -37,7 +38,7 @@ class MatchingModel(models.Model):
    )

    name = models.CharField(max_length=128, unique=True)
-    slug = models.SlugField(blank=True)
+    slug = models.SlugField(blank=True, editable=False)

    match = models.CharField(max_length=256, blank=True)
    matching_algorithm = models.PositiveIntegerField(
@@ -147,8 +148,6 @@ class MatchingModel(models.Model):
    def save(self, *args, **kwargs):

        self.match = self.match.lower()
-
-        if not self.slug:
        self.slug = slugify(self.name)

        models.Model.save(self, *args, **kwargs)
@@ -452,7 +451,7 @@ class FileInfo:
        r = []
        for t in tags.split(","):
            r.append(Tag.objects.get_or_create(
-                slug=t.lower(),
+                slug=slugify(t),
                defaults={"name": t}
            )[0])
        return tuple(r)
--- a/src/documents/parsers.py
+++ b/src/documents/parsers.py
@@ -1,9 +1,13 @@
 import logging
-import shutil
-import tempfile
+import os
 import re
+import shutil
+import subprocess
+import tempfile

+import dateparser
 from django.conf import settings
+from django.utils import timezone

 # This regular expression will try to find dates in the document at
 # hand and will match the following formats:
@@ -32,6 +36,8 @@ class DocumentParser:
    """

    SCRATCH = settings.SCRATCH_DIR
+    DATE_ORDER = settings.DATE_ORDER
+    OPTIPNG = settings.OPTIPNG_BINARY

    def __init__(self, path):
        self.document_path = path
@@ -45,6 +51,19 @@ class DocumentParser:
        """
        raise NotImplementedError()

+    def optimise_thumbnail(self, in_path):
+
+        out_path = os.path.join(self.tempdir, "optipng.png")
+
+        args = (self.OPTIPNG, "-o5", in_path, "-out", out_path)
+        if not subprocess.Popen(args).wait() == 0:
+            raise ParseError("Optipng failed at {}".format(args))
+
+        return out_path
+
+    def get_optimised_thumbnail(self):
+        return self.optimise_thumbnail(self.get_thumbnail())
+
    def get_text(self):
        """
        Returns the text from the document and only the text.
@@ -55,7 +74,52 @@ class DocumentParser:
        """
        Returns the date of the document.
        """
-        raise NotImplementedError()
+
+        date = None
+        date_string = None
+
+        try:
+            text = self.get_text()
+        except ParseError:
+            return None
+
+        next_year = timezone.now().year + 5  # Arbitrary 5 year future limit
+
+        # Iterate through all regex matches and try to parse the date
+        for m in re.finditer(DATE_REGEX, text):
+
+            date_string = m.group(0)
+
+            try:
+                date = dateparser.parse(
+                    date_string,
+                    settings={
+                        "DATE_ORDER": self.DATE_ORDER,
+                        "PREFER_DAY_OF_MONTH": "first",
+                        "RETURN_AS_TIMEZONE_AWARE": True
+                    }
+                )
+            except TypeError:
+                # Skip all matches that do not parse to a proper date
+                continue
+
+            if date is not None and next_year > date.year > 1900:
+                break
+            else:
+                date = None
+
+        if date is not None:
+            self.log(
+                "info",
+                "Detected document date {} based on string {}".format(
+                    date.isoformat(),
+                    date_string
+                )
+            )
+        else:
+            self.log("info", "Unable to detect date for document")
+
+        return date

    def log(self, level, message):
        getattr(self.logger, level)(message, extra={
--- a/src/paperless/checks.py
+++ b/src/paperless/checks.py
@@ -76,7 +76,12 @@ def binaries_check(app_configs, **kwargs):
    error = "Paperless can't find {}. Without it, consumption is impossible."
    hint = "Either it's not in your ${PATH} or it's not installed."

-    binaries = (settings.CONVERT_BINARY, settings.UNPAPER_BINARY, "tesseract")
+    binaries = (
+        settings.CONVERT_BINARY,
+        settings.OPTIPNG_BINARY,
+        settings.UNPAPER_BINARY,
+        "tesseract"
+    )

    check_messages = []
    for binary in binaries:
--- a/src/paperless/models.py
+++ b/src/paperless/models.py
@@ -1,15 +1,20 @@
+from django.contrib.auth.models import User as DjangoUser
+
+
 class User:
    """
    This is a dummy django User used with our middleware to disable
    login authentication if that is configured in paperless.conf
    """
+
    is_superuser = True
    is_active = True
    is_staff = True
    is_authenticated = True

-    # Must be -1 to avoid colliding with real user ID's (which start at 1)
-    id = -1
+    @property
+    def id(self):
+        return DjangoUser.objects.order_by("pk").first().pk

    @property
    def pk(self):
@@ -18,8 +23,8 @@ class User:

 """
 NOTE: These are here as a hack instead of being in the User definition
-  above due to the way pycodestyle handles lamdbdas.
-  See https://github.com/PyCQA/pycodestyle/issues/379 for more.
+NOTE: above due to the way pycodestyle handles lamdbdas.
+NOTE: See https://github.com/PyCQA/pycodestyle/issues/379 for more.
 """

 User.has_module_perms = lambda *_: True
--- a/src/paperless/settings.py
+++ b/src/paperless/settings.py
@@ -247,6 +247,9 @@ CONVERT_TMPDIR = os.getenv("PAPERLESS_CONVERT_TMPDIR")
 CONVERT_MEMORY_LIMIT = os.getenv("PAPERLESS_CONVERT_MEMORY_LIMIT")
 CONVERT_DENSITY = os.getenv("PAPERLESS_CONVERT_DENSITY")

+# OptiPNG
+OPTIPNG_BINARY = os.getenv("PAPERLESS_OPTIPNG_BINARY", "optipng")
+
 # Unpaper
 UNPAPER_BINARY = os.getenv("PAPERLESS_UNPAPER_BINARY", "unpaper")

--- a/src/paperless/version.py
+++ b/src/paperless/version.py
@@ -1 +1 @@
-__version__ = (2, 3, 0)
+__version__ = (2, 5, 0)
--- a/src/paperless_tesseract/parsers.py
+++ b/src/paperless_tesseract/parsers.py
@@ -4,7 +4,6 @@ import re
 import subprocess
 from multiprocessing.pool import Pool

-import dateparser
 import langdetect
 import pyocr
 from django.conf import settings
@@ -14,7 +13,7 @@ from pyocr.libtesseract.tesseract_raw import \
 from pyocr.tesseract import TesseractError

 import pdftotext
-from documents.parsers import DocumentParser, ParseError, DATE_REGEX
+from documents.parsers import DocumentParser, ParseError

 from .languages import ISO639

@@ -33,7 +32,6 @@ class RasterisedDocumentParser(DocumentParser):
    DENSITY = settings.CONVERT_DENSITY if settings.CONVERT_DENSITY else 300
    THREADS = int(settings.OCR_THREADS) if settings.OCR_THREADS else None
    UNPAPER = settings.UNPAPER_BINARY
-    DATE_ORDER = settings.DATE_ORDER
    DEFAULT_OCR_LANGUAGE = settings.OCR_LANGUAGE
    OCR_ALWAYS = settings.OCR_ALWAYS

@@ -46,15 +44,18 @@ class RasterisedDocumentParser(DocumentParser):
        The thumbnail of a PDF is just a 500px wide image of the first page.
        """

+        out_path = os.path.join(self.tempdir, "convert.png")
+
+        # Run convert to get a decent thumbnail
        run_convert(
            self.CONVERT,
            "-scale", "500x5000",
            "-alpha", "remove",
            "{}[0]".format(self.document_path),
-            os.path.join(self.tempdir, "convert.png")
+            out_path
        )

-        return os.path.join(self.tempdir, "convert.png")
+        return out_path

    def _is_ocred(self):

@@ -202,40 +203,6 @@ class RasterisedDocumentParser(DocumentParser):
        text += self._ocr(imgs[middle + 1:], self.DEFAULT_OCR_LANGUAGE)
        return text

-    def get_date(self):
-        date = None
-        datestring = None
-
-        try:
-            text = self.get_text()
-        except ParseError as e:
-            return None
-
-        # Iterate through all regex matches and try to parse the date
-        for m in re.finditer(DATE_REGEX, text):
-            datestring = m.group(0)
-
-            try:
-                date = dateparser.parse(
-                           datestring,
-                           settings={'DATE_ORDER': self.DATE_ORDER,
-                                     'PREFER_DAY_OF_MONTH': 'first',
-                                     'RETURN_AS_TIMEZONE_AWARE': True})
-            except TypeError:
-                # Skip all matches that do not parse to a proper date
-                continue
-
-            if date is not None:
-                break
-
-        if date is not None:
-            self.log("info", "Detected document date " + date.isoformat() +
-                             " based on string " + datestring)
-        else:
-            self.log("info", "Unable to detect date for document")
-
-        return date
-

 def run_convert(*args):

--- a/src/paperless_tesseract/tests/test_date.py
+++ b/src/paperless_tesseract/tests/test_date.py
@@ -384,3 +384,42 @@ class TestDate(TestCase):
            document.get_date(),
            datetime.datetime(2017, 12, 31, 0, 0, tzinfo=tz.tzutc())
        )
+
+    @mock.patch(
+        "paperless_tesseract.parsers.RasterisedDocumentParser.get_text",
+        return_value="01-07-0590 00:00:00"
+    )
+    @mock.patch(
+        "paperless_tesseract.parsers.RasterisedDocumentParser.SCRATCH",
+        SCRATCH
+    )
+    def test_crazy_date_past(self, *args):
+        document = RasterisedDocumentParser("/dev/null")
+        document.get_text()
+        self.assertIsNone(document.get_date())
+
+    @mock.patch(
+        "paperless_tesseract.parsers.RasterisedDocumentParser.get_text",
+        return_value="01-07-2350 00:00:00"
+    )
+    @mock.patch(
+        "paperless_tesseract.parsers.RasterisedDocumentParser.SCRATCH",
+        SCRATCH
+    )
+    def test_crazy_date_future(self, *args):
+        document = RasterisedDocumentParser("/dev/null")
+        document.get_text()
+        self.assertIsNone(document.get_date())
+
+    @mock.patch(
+        "paperless_tesseract.parsers.RasterisedDocumentParser.get_text",
+        return_value="01-07-0590 00:00:00"
+    )
+    @mock.patch(
+        "paperless_tesseract.parsers.RasterisedDocumentParser.SCRATCH",
+        SCRATCH
+    )
+    def test_crazy_date_past(self, *args):
+        document = RasterisedDocumentParser("/dev/null")
+        document.get_text()
+        self.assertIsNone(document.get_date())
--- a/src/paperless_text/parsers.py
+++ b/src/paperless_text/parsers.py
@@ -1,11 +1,9 @@
 import os
-import re
 import subprocess

-import dateparser
 from django.conf import settings

-from documents.parsers import DocumentParser, ParseError, DATE_REGEX
+from documents.parsers import DocumentParser, ParseError


 class TextDocumentParser(DocumentParser):
@@ -16,7 +14,6 @@ class TextDocumentParser(DocumentParser):
    CONVERT = settings.CONVERT_BINARY
    THREADS = int(settings.OCR_THREADS) if settings.OCR_THREADS else None
    UNPAPER = settings.UNPAPER_BINARY
-    DATE_ORDER = settings.DATE_ORDER
    DEFAULT_OCR_LANGUAGE = settings.OCR_LANGUAGE
    OCR_ALWAYS = settings.OCR_ALWAYS

@@ -26,7 +23,7 @@ class TextDocumentParser(DocumentParser):

    def get_thumbnail(self):
        """
-        The thumbnail of a txt is just a 500px wide image of the text
+        The thumbnail of a text file is just a 500px wide image of the text
        rendered onto a letter-sized page.
        """
        # The below is heavily cribbed from https://askubuntu.com/a/590951
@@ -35,7 +32,7 @@ class TextDocumentParser(DocumentParser):
        text_color = "black"  # text color
        psize = [500, 647]  # icon size
        n_lines = 50  # number of lines to show
-        output_file = os.path.join(self.tempdir, "convert-txt.png")
+        out_path = os.path.join(self.tempdir, "convert.png")

        temp_bg = os.path.join(self.tempdir, "bg.png")
        temp_txlayer = os.path.join(self.tempdir, "tx.png")
@@ -46,9 +43,13 @@ class TextDocumentParser(DocumentParser):
            work_size = ",".join([str(n - 1) for n in psize])
            r = str(round(psize[0] / 10))
            rounded = ",".join([r, r])
-            run_command(self.CONVERT, "-size ", picsize, ' xc:none -draw ',
-                        '"fill ', bg_color, ' roundrectangle 0,0,',
-                        work_size, ",", rounded, '" ', temp_bg)
+            run_command(
+                self.CONVERT,
+                "-size ", picsize,
+                ' xc:none -draw ',
+                '"fill ', bg_color, ' roundrectangle 0,0,', work_size, ",", rounded, '" ',  # NOQA: E501
+                temp_bg
+            )

        def read_text():
            with open(self.document_path, 'r') as src:
@@ -57,7 +58,8 @@ class TextDocumentParser(DocumentParser):
                return text.replace('"', "'")

        def create_txlayer():
-            run_command(self.CONVERT,
+            run_command(
+                self.CONVERT,
                "-background none",
                "-fill",
                text_color,
@@ -65,14 +67,20 @@ class TextDocumentParser(DocumentParser):
                "-border 4 -bordercolor none",
                "-size ", txsize,
                ' caption:"', read_text(), '" ',
-                        temp_txlayer)
+                temp_txlayer
+            )

        create_txlayer()
        create_bg()
-        run_command(self.CONVERT, temp_bg, temp_txlayer,
-                    "-background None -layers merge ", output_file)
+        run_command(
+            self.CONVERT,
+            temp_bg,
+            temp_txlayer,
+            "-background None -layers merge ",
+            out_path
+        )

-        return output_file
+        return out_path

    def get_text(self):

@@ -84,40 +92,6 @@ class TextDocumentParser(DocumentParser):

        return self._text

-    def get_date(self):
-        date = None
-        datestring = None
-
-        try:
-            text = self.get_text()
-        except ParseError as e:
-            return None
-
-        # Iterate through all regex matches and try to parse the date
-        for m in re.finditer(DATE_REGEX, text):
-            datestring = m.group(0)
-
-            try:
-                date = dateparser.parse(
-                           datestring,
-                           settings={'DATE_ORDER': self.DATE_ORDER,
-                                     'PREFER_DAY_OF_MONTH': 'first',
-                                     'RETURN_AS_TIMEZONE_AWARE': True})
-            except TypeError:
-                # Skip all matches that do not parse to a proper date
-                continue
-
-            if date is not None:
-                break
-
-        if date is not None:
-            self.log("info", "Detected document date " + date.isoformat() +
-                             " based on string " + datestring)
-        else:
-            self.log("info", "Unable to detect date for document")
-
-        return date
-

 def run_command(*args):
    environment = os.environ.copy()
--- a/src/reminders/migrations/0002_auto_20181007_1420.py
+++ b/src/reminders/migrations/0002_auto_20181007_1420.py
@@ -0,0 +1,19 @@
+# Generated by Django 2.0.8 on 2018-10-07 14:20
+
+from django.db import migrations, models
+import django.db.models.deletion
+
+
+class Migration(migrations.Migration):
+
+    dependencies = [
+        ('reminders', '0001_initial'),
+    ]
+
+    operations = [
+        migrations.AlterField(
+            model_name='reminder',
+            name='document',
+            field=models.ForeignKey(on_delete=django.db.models.deletion.PROTECT, to='documents.Document'),
+        ),
+    ]
--- a/src/reminders/models.py
+++ b/src/reminders/models.py
@@ -4,7 +4,6 @@ from django.db import models
 class Reminder(models.Model):

    document = models.ForeignKey(
-        "documents.Document", on_delete=models.PROTECT
-        )
+        "documents.Document", on_delete=models.PROTECT)
    date = models.DateTimeField()
    note = models.TextField(blank=True)
Author	SHA1	Message	Date
Daniel Quinn	2ef2bf873e	Version bump: 2.5.0	2018-10-07 16:30:36 +01:00
Daniel Quinn	0bb7d27269	pep8	2018-10-07 16:30:02 +01:00
Daniel Quinn	ce5e8b2658	Rework user hack for "login-free" sessions #394	2018-10-07 16:27:41 +01:00
Daniel Quinn	3f572afb8b	Add a little more read-only info for documents	2018-10-07 16:26:05 +01:00
Daniel Quinn	5c3cb1e4ab	Rework how slugs are generated/referenced #393	2018-10-07 16:25:51 +01:00
Daniel Quinn	c7f4bfe4f3	Add migration that should have come in some time ago	2018-10-07 16:23:03 +01:00
Daniel Quinn	65d6599964	Fix formatting	2018-10-07 16:22:52 +01:00
Daniel Quinn	5d32e89c44	Wrap each document consumption in a transaction	2018-10-07 14:56:56 +01:00
Daniel Quinn	750ab5bf85	Use optipng to optimise document thumbnails	2018-10-07 14:56:38 +01:00
Daniel Quinn	2a3f766b93	Consolidate get_date onto the DocumentParser parent class	2018-10-07 14:56:02 +01:00
Daniel Quinn	14bb52b6a4	Wrap document consumption in a transaction #262	2018-10-07 13:12:22 +01:00
Daniel Quinn	b5176d207e	Hopefully fix Travis	2018-10-01 20:40:43 +01:00
Daniel Quinn	e4044d0df9	Update version number & changelog	2018-10-01 20:40:32 +01:00
Daniel Quinn	bacdd51fd7	Merge pull request #413 from euri10/master Fix issue where tesseract langages weren't installed properly	2018-10-01 19:40:04 +00:00
Daniel Quinn	8010d72f18	Tweak the date guesser to not allow dates prior to 1900 (#414 )	2018-10-01 20:03:47 +01:00
euri10	9dd76f1b87	Fix issue where tesseract langages weren't installed properly	2018-09-24 13:30:10 +02:00