From dd9bbca84d933d125eeabff3fb2d7ec84afe9da1 Mon Sep 17 00:00:00 2001 From: John-Mark Gurney Date: Thu, 23 May 2019 23:24:32 -0700 Subject: [PATCH] simplify things a bit, primarily only metadata objects will ever exist. --- README.md | 35 ++++++++++++++++++++++++----------- sample/file.md | 31 +++++++++++++++++++++++++++---- 2 files changed, 51 insertions(+), 15 deletions(-) diff --git a/README.md b/README.md index 824e1a8..3647c8a 100644 --- a/README.md +++ b/README.md @@ -25,6 +25,10 @@ This work is inspired by my work on STIX, a Cyber Threat Intelligence standard, 9. i18n. Provide translations for fields as needed. Often movie titles will have different translations for different markets/languages. Actors may have different names (e.g. Chinese name vs English name). 10. Overlaying/replacing meta data from someone else's object. This may include deleting properties. Say an actor is missing, or you want to add them to it, or you've encoded the DVD, and you just link to someone's BluRay version. +## URN + +Each object has a URN which uniquely describes it. XXX copy from STIX URN proposal, which is simlar to the magnet proposal. + ## Types Everything must have a type. Not having well defined types can lead to confusion and problems. Different encoding schemes have different ways of encoding types. If the encoding scheme has a native way to encode that type, it should be used. In some cases, e.g. JSON, there is no formal types beyond numbers and strings, and in this case, a type should (MUST? or via schemas?) be layered on top. @@ -33,6 +37,18 @@ Everything must have a type. Not having well defined types can lead to confusio Look at adding units. +### Hash String + +The hash string is name of hash followed by a colon followed by the hex string. + +The list of valid hashes is: +- sha256 +- sha512 + +### Reference + +A reference is the UUID optionally followed by two dashes (--) followed by the modified date of the object. The modified date is neccessary in some cases to know what version of the object is being referenced. + ## Objects These are the nodes that contain a majority of the data. @@ -40,19 +56,25 @@ These are the nodes that contain a majority of the data. ### Common Properties The following properties are present on all (most?) objects: +type The type of the object. producer_ref UUID of the producer that created this object. Add signing info. ### MetaData Object Properties: +type 'metadata' uuid UUIDv4 -modified date of last modification +modified date of last modification of the metadata object dc: A [Dublin Core] property object_marking_refs Imported from [STIX v2.0 Part 1]: Section 3.1 granular_markings Imported from [STIX v2.0 Part 1]: Section 3.1 +hashes A list of hash strings. lang RFC XXXX language of the properties. parent_ref UUIDv4 of the parent MetaData Object. Any properties on this object override the parent. (allow deletion via None/null?) Any missing properties are passed through to the parent for resolution. +mime-type The mime-type. If the set of bytes is polymorphic, there should be one for each "type". +uri List of URI's where the file may be located. +child_files A dictionary where the keys are the file names and the values are hash strings. (One issue w/ using hashes is that you can't tie YOUR idea of the metadata, but it also allows a person to have metadata about a file that is private and not be forced to share it, nor create a dummy object.) Opinion Properties: qualityrating On a scale from 1 (poor/terrible) to 5 (great/pristine), the subjective quality of the content. @@ -69,21 +91,12 @@ If a property is imported from the blog itself, it is recommended to mark it as Open Questions: When meta data is "declassified", how do you maintain a link to the classified version? -### Blob Object - -Properties: -uuid UUIDv4 -blobhash Hash of the blob. This needs to be indexed -metadata_ref UUID of the MetaData Object - -This is the main mapping object. It maps a set of binary data to the MetaData object. All the data must be stored on the MetaData object. The reason it has a UUIDv4 is that this is your private mapping for the blog. You could possibly have multiple mappings, but most people will only have one, and this also allows you to publish your mapping, and coexist w/ other producer's mappings. - ### File Object Properties: +type 'file' uuid UUIDv5 If the stats do not match, check hash, create a derivative blob object, possibly? modified date of last modification of the object -blobhash Hash of the binary data. stat Stats for the file, modified time, file size, used to detect when file has been changed/modified. A file object references a blob Object, and contains information about the file name in the file system associated w/ the blob. This is used to speed up looking up blob objects. diff --git a/sample/file.md b/sample/file.md index 0885a38..75025d8 100644 --- a/sample/file.md +++ b/sample/file.md @@ -1,25 +1,48 @@ Sample structure for sharing file information. -# Base file +# Example object hierarchy -secure hash of file -file name? I don't think it should be part of this, as the set of bytes could have any name. -metadata, e.g. code, mime-type, language, alt hashes +file -> metadata + +# MetaData Object + +secure hash of data +list or dict for hashes? ``` { 'id': 'uuid', + 'dc:author': 'example author', + 'hashes': [ 'sha256:xxxx' ], 'hash': 'sha256:xxxx', 'length': 1234, + 'uri': { + 'https://www.example.com/a/path/filename.txt' + } xxxmetadata } ``` +# Location of file + +``` +{ + 'id': 'uuid5', + 'uri': [ + 'https://example.com/path/to/file', + 'magnet:?xxx', + 'ipfs:xxx' + ] +} +``` + # Links to file from FS hostname + path link to base file +Why not use a file URI w/ host part? There is no UUID host name + How are these versioned? Are they? They need to be, via modified ```