SQLite3 does not do joins in a sane manner, so we emulate them
w/ subqueries for a large boost. Not sure if adding distinct would
improve things or not, the query plan does not change between the
two (but the lower ops may), but in a quick test, it didn't seem
to make a difference (not evaluated statistically)...
add a class to emulate a file, and only store the part of the file
that was read/accessed... This reduces storing an 11MB file down
to under 100KB... It also allows tests to run w/o the whole file...
Put the original files in fixtures/original...
fix up a couple of issues w/ parsing CRW files, and also allow the
ability to skip parts of the CRW file... This allows skipping
large parts, like the CCD data and the large thumbnail..
This still needs some cleanup and additional tests.. This isn't
hooked into the testing system yet as I still haven't decided if
I'm going to commit fixtures or not (or maybe make this it's own
repo)..
IFD needs serious cleanup.. I should be using a classmethod instead
of the janky nextptr bs.
This imports magic.py from file-magic and merges magic_wrap.py into
it...
This also updates detect_from_filename to try w/ _COMPRESS, and if
it returns an error, normal mode. This is necessary as [some?] zip
files can be decompressed by gzip, but throws an error...
The original query applied a complicated test, which sqlite couldn't
tell if it applied to all..
In the case of any inclusion, it's easy, only search metadata, and match
to files.
If all exclusion, make two parts, the part w/ a metadata object that
doesn't have the exclusions, and the part w/o any metadata objects..
Both of these later two queries can be satified more simply and with
proper indices..
The old query might have worked fine on a more advanced DB, but was
necessary for decent performance..
My testing machine has 10 cpus, and so didn't trigger the failure
where not all the work was submitted.. We need to pop the completed
work items, and keep doing the for loop while we have futures to
process... this submits and processes all work..
as we might have a lot of work to submit, BUT it might fail early,
don't submit too much work early on, just for it to fail, so we
limit how much work we submit..
This will be used to allow parallel processing of torrent pieces..
each piece of the torrent can be processed in parallel, and this
class will make sure that when processing the hash of a file in
the torrent, it will be hashed in the correct order...